Generation of high dynamic range images from low dynamic range images

ABSTRACT

An approach is provided for generating a high dynamic range image from a low dynamic range image. The generation is performed using a mapping relating input data in the form of input sets of image spatial positions and a combination of color coordinates of low dynamic range pixel values associated with the image spatial positions to output data in the form of high dynamic range pixel values. The mapping is generated from a reference low dynamic range image and a corresponding reference high dynamic range image. Thus, a mapping from the low dynamic range image to a high dynamic range image is generated on the basis of corresponding reference images. The approach may be used for prediction of high dynamic range images from low dynamic range images in an encoder and decoder. A residual image may be generated and used to provide improved high dynamic range image quality.

FIELD OF THE INVENTION

The invention relates to generation of high dynamic range images fromlow dynamic range images and in particular, but not exclusively, togeneration of high dynamic range video sequences from low dynamic rangevideo sequences.

BACKGROUND OF THE INVENTION

Digital encoding of various source signals has become increasinglyimportant over the last decades as digital signal representation andcommunication increasingly has replaced analogue representation andcommunication. Continuous research and development is ongoing in how toimprove the quality that can be obtained from encoded images and videosequences while at the same time keeping the data rate to acceptablelevels.

An important factor for perceived image quality is the dynamic rangethat can be reproduced when an image is displayed. However,conventionally, the dynamic range of reproduced images has tended to besubstantially reduced in relation to normal vision. Indeed, luminancelevels encountered in the real world span a dynamic range as large as 14orders of magnitude, varying from a moonless night to staring directlyinto the sun. Traditionally, dynamic range of image sensors and displayshas been confined to about 2-3 orders of magnitude. Consequently, it hastraditionally been possible to store and transmit images in 8-bitgamma-encoded formats without introducing perceptually noticeableartifacts on traditional rendering devices. However, in an effort torecord more precise and livelier imagery, novel High Dynamic Range (HDR)image sensors that are capable of recording dynamic ranges of more than6 orders of magnitude have been developed. Moreover, most specialeffects, computer graphics enhancement and other post-production workare already routinely conducted at higher bit depths.

Furthermore, the contrast and peak luminance of state-of-the-art displaysystems continues to increase. Recently, new prototype displays havebeen presented with a peak luminance as high as 3000 Cd/m⁻² and above,and contrast ratios of 4 orders of magnitude and above. Whentraditionally encoded 8-bit signals are displayed on such displays,annoying quantization and clipping artifacts may appear. Moreover,traditional video formats offer insufficient headroom and accuracy toconvey the rich information contained in new HDR imagery.

As a result, there is a growing need for new video formats that allow aconsumer to fully benefit from the capabilities of state-of-the-artsensors and display systems. Preferably, such formats arebackwards-compatible such that legacy equipment can still receiveordinary video streams, while new HDR-enabled devices take fulladvantage of the additional information conveyed by the new format.Thus, it is desirable that encoded video data not only represents theHDR images but also allow encoding of traditional Low Dynamic Range(LDR) images that can be displayed on conventional equipment.

The most straightforward approach would be to compress and store LDR andHDR streams independently of each-other (simulcast). However, this wouldresult in a high data rate. In order to improve the compressionefficiency, it has been proposed to employ inter-layer prediction whereHDR data is predicted from an LDR stream, such that only the smallerdifferences between the actual HDR data and its prediction need to beencoded and stored/transmitted.

However, prediction of HDR from LDR data tends to be difficult andrelatively inaccurate. Indeed, the relationship between correspondingLDR and HDR tends to be very complex and may often vary strongly betweendifferent parts of the image. For example, an LDR image may often begenerated by tone mapping and color grading of an HDR image. The exacttone mapping/color grading, and thus the relationship between the HDRand LDR images will depend on the specific algorithm and parameterschosen for the color grading and is thus likely to vary depending on thesource. Indeed, color grading may often be subjectively and individuallymodified not only for different content items but also between differentimages and indeed very often between different parts of an image. Forexample, a color grader may select different objects in an image andapply separate and individual color grading to each object.Consequently, prediction of HDR images from LDR images is typically verydifficult and ideally requires adaptation to the specific approach usedto generate the LDR image from the HDR image.

An example of an approach for predicting an HDR image is presented inMantiuk, R., Efremov, A., Myszkowski, K., and Seidel, H.2006. Backwardcompatible high dynamic range MPEG video compression. ACM Trans. Graph.25, 3 (July 2006), 713-723. In this approach a global reconstructionfunction is estimated and used to perform the inter-layer prediction.However, the approach tends to result in suboptimal results and tends tobe less accurate than desired. In particular, the use of a globalreconstruction function tends to allow only a rough estimation as itcannot take into account local variations in the relationship betweenHDR and LDR data e.g. caused by application of a different color grading

Another approach is proposed in US Patent Application US2009/0175338wherein a mechanism for inter-layer prediction that operates on amacroblock (MB) level is presented. In the approach, the HDR stream isfor each macroblock locally predicted by estimating a scale and offsetparameter, which corresponds to a linear regression of the macroblockdata. However, although this may allow a more local prediction, thesimplicity of the linear model applied often fails to accuratelydescribe the intricate relations between LDR and HDR data, particularlyin the vicinity of high-contrast and color edges.

Hence, an improved approach for encoding HDR/LDR data and/or forgenerating HDR data from LDR data would be advantageous. In particular asystem allowing for increased flexibility, facilitated implementationand/or operation, improved and/or automated adaptation, increasedaccuracy, reduced encoding data rates and/or improved performance wouldbe advantageous.

SUMMARY OF THE INVENTION

Accordingly, the Invention seeks to preferably mitigate, alleviate oreliminate one or more of the above mentioned disadvantages singly or inany combination.

According to an aspect of the invention there is provided a method ofencoding an input image, the method comprising: receiving the inputimage; generating a mapping relating input data in the form of inputsets of image spatial positions and a combination of color coordinatesof low dynamic range pixel values associated with the image spatialpositions to output data in the form of high dynamic range pixel valuesin response to a reference low dynamic range image and a correspondingreference high dynamic range image; and generating an output encodeddata stream by encoding the input image in response to the mapping. Notethat the high dynamic range pixel values need not be exactly the singlevalue present at e.g. a spatial subsampled position, but can e.g. alsobe a derived value, e.g. an average of neighboring values in the highdynamic range picture, or an archetypical HDR value for that subsampledposition.

The invention may provide an improved encoding. For example, it mayallow encoding to be adapted and targeted to specific dynamic rangecharacteristics, and in particular to characteristics associated withdynamic range expansion techniques that may be performed by a suitabledecoder. The invention may for example provide an encoding that mayallow a decoder to enhance a received encoded low dynamic range image toa high dynamic range image. The use of a mapping based on referenceimages may in particular in many embodiments allow an automated and/orimproved adaptation to image characteristics without requiringpredetermined rules or algorithms to be developed and applied forspecific image characteristics.

The image positions that may be considered to be associated with thecombination may for a specific input set e.g. be determined as the imagepositions that meet a neighborhood criterion for the image spatialpositions for the specific input set. For example, it may include imagepositions that are less than a given distance from the position of theinput set, that belong to the same image object as the position of theinput set, that falls within position ranges defined for the input setetc.

The combination may for example be a combination that combines aplurality of color coordinate values into fewer values, and specificallyinto a single value. For example, the combination may combine colorcoordinates (such as RGB values) into a single luminance value. Asanother example, the combination may combine values of neighboringpixels into a single average or differential value. In otherembodiments, the combination may alternatively or additionally be aplurality of values. For example, the combination may be a data setcomprising a pixel value for each of a plurality of neighboring pixels.Thus, in some embodiments, the combination may correspond to oneadditional dimension of the mapping (i.e. in addition to the spatialdimensions) and in other embodiments the combination may correspond to aplurality of additional dimensions of the mapping.

A color coordinate may be any value reflecting a visual characteristicof the pixel and may specifically be a luminance value, a chroma value,a chromaticity coordinate from a chromaticity tuple (e.g. x from (x,y)),or a chrominance value, etc. The combination may in some embodimentscomprise only one pixel value corresponding to an image spatial positionfor the input set, or may be a more complex characterization of localchromatic structure or content.

The method may include dynamically generating the mapping. For example,a new mapping may be generated for each image of a video sequence ore.g. for each N^(th) image where N is an integer. Note that successiveN's may be of varying magnitude, e.g. determined by shot boundaries.

In accordance with an optional feature of the invention, the input imageis an input high dynamic range image; and the method further comprises:receiving an input low dynamic range image corresponding to the inputhigh dynamic range image; generating a prediction base image from theinput low dynamic range image; predicting a predicted high dynamic rangeimage from the prediction base image in response to the mapping;encoding the residual high dynamic range image in response to thepredicted high dynamic range image and the input high dynamic rangeimage to generate encoded high dynamic range data; and including theencoded high dynamic range data in the output encoded data stream.

The invention may provide improved encoding of HDR images. Inparticular, improved prediction of an HDR image from an LDR image may beachieved allowing a reduced residual signal and thus more efficientencoding. A data rate of the enhancement layer, and thus of the combinedsignal, may be achieved.

The approach may allow prediction to be based on an improved and/orautomatic adaptation to the specific relationship between HDR and LDRimages. For example, the approach may automatically adapt to reflect theapplication of different tone mapping and color grading approacheswhether for different sources, images or indeed parts of images. Forexample, the approach may adapt to specific characteristics withinindividual image objects.

The approach may in many scenarios allow backwards compatibility withexisting LDR equipment which may simply use a base layer comprising anLDR encoding of the input image. Furthermore, the approach may allow alow complexity implementation thereby allowing reduced cost, resourcerequirements and usage, or facilitated design or manufacturing.

The prediction base image may specifically be generated by encoding theinput low dynamic range image to generate encoded data; and generatingthe prediction base image by decoding the encoded data. Other baseimages may be generated, e.g. by processing a decoded reconstructionwith e.g. compression artefact mitigation functions, or predefined colortransformations, etc.

The method may comprise generating the output encoded data stream tohave a first layer comprising encoded data for the input image and asecond layer comprising encoded data for the residual image. The secondlayer may be an optional layer and specifically the first layer may be abase layer and the second layer may be an enhancement layer.

The encoding of the residual high dynamic range image may specificallycomprise generating residual data for at least part of the high dynamicrange image by a comparison of the input high dynamic range image andthe predicted dynamic range image; and generating at least part of theencoded high dynamic range data by encoding the residual data.

In accordance with an optional feature of the invention, each input setcorresponds to a spatial interval for each spatial image dimension andat least one value interval for the combination, and the generation ofthe mapping comprises for each image position of at least a group ofimage positions of the reference low dynamic range image: determining atleast one matching input set having spatial intervals corresponding tothe each image position and a value interval for the combinationcorresponding to a combination value for the each image position in thereference low dynamic range image; and determining an output highdynamic range pixel value for the matching input set in response to ahigh dynamic range pixel value for the each image position in thereference high dynamic range image.

This provides an efficient and accurate approach for determining asuitable mapping for dynamic range modification.

In some embodiments, a plurality of matching input sets may bedetermined for at least a first position of the at least a group ofimage positions and determining output high dynamic range pixel valuesfor each of the plurality of matching input sets in response to a highdynamic range pixel value for the first position in the mapping highdynamic range image.

In some embodiments the method further comprises determining the outputhigh dynamic range pixel value for a first input set in response to anaveraging of contributions from all high dynamic range pixel values forimage positions of the at least a group of image positions which matchthe first input set.

In accordance with an optional feature of the invention, the mapping isat least one of: a spatially subsampled mapping; a temporally subsampledmapping; and a combination value subsampled mapping, e.g. a number oflocal pixel color combinations or combination tuples may be calculatedand they may be used as entries to estimate the predicted HDR values.

This may in many embodiments provide an improved efficiency and/orreduced data rate or resource requirements while still allowingadvantageous operation. The temporal subsampling may comprise updatingthe mapping for a subset of images of a sequence of images. Thecombination value subsampling may comprise application of a coarserquantization of one or more values of the combination than resultingfrom the quantization of the pixel values. The spatial subsampling maycomprise each input sets covering a plurality of pixel positions.

In accordance with an optional feature of the invention, the input imageis an input high dynamic range image; and the method further comprises:receiving an input low dynamic range image corresponding to the inputhigh dynamic range image; generating a prediction base image from theinput low dynamic range image; predicting a predicted high dynamic rangeimage from the prediction base image in response to the mapping; andadapting at least one of the mapping and a residual high dynamic rangeimage for the predicted high dynamic range image in response to acomparison of the input high dynamic range image and the predicted highdynamic range image.

This may allow an improved encoding and may in many embodiments allowthe data rate to be adapted to specific image characteristics. Forexample, the data rate may be reduced to a level required for a givenquality level with a dynamic adaptation of the data rate to achieve avariable minimum data rate.

In some embodiments, the adaptation may comprise determining whether tomodify part or all of the mapping. For example, if the mapping resultsin a predicted high dynamic range image which deviates more than a givenamount from the input high dynamic range image, the mapping may bepartially or fully modified to result in an improved prediction. Forexample, the adaptation may comprise modifying specific high dynamicrange pixel values provided by the mapping for specific input sets.

In some embodiments, the method may include a selection of elements ofat least one of mapping data and residual high dynamic range image datato include in the output encoded data stream in response to a comparisonof the input high dynamic range image and the predicted high dynamicrange image. The mapping data and/or the residual high dynamic rangeimage data may for example be restricted to areas wherein the differencebetween the input high dynamic range image and the predicted highdynamic range image exceeds a given threshold.

In accordance with an optional feature of the invention, the input imageis the reference high dynamic range image and the reference low dynamicrange image is an input low dynamic range image corresponding to theinput image.

This may in many embodiments allow a highly efficient prediction of ahigh dynamic range image from an input low dynamic range image, and mayin many scenarios provide a particularly efficient encoding of both lowand high dynamic range images. The method may further include mappingdata characterizing at least part of the mapping in the output encodeddata stream.

In accordance with an optional feature of the invention, the input setsfor the mapping further comprises depth indications associated withimage spatial positions and the mapping further reflects a relationshipbetween depth and high dynamic range pixel values. This may provide animproved mapping and may for example allow the mapping to be used togenerate an improved prediction for the input image. The approach mayallow a reduced data rate for a given quality level. A depth indicationmay be any suitable indication of depth in the image including a depth(z direction) value or a disparity value. E.g., depth may be related toshadows on object, and the prediction may implicitly or explicitlyestimate such.

In accordance with an optional feature of the invention, the input imagecorresponds to a high dynamic range first view image of a multi-viewimage and the method further comprises: encoding a high dynamic rangesecond view image for the multi-view image in response to the highdynamic range first view image.

The approach may allow a particularly efficient encoding of multi-viewimages and may allow an improved data rate to quality ratio and/orfacilitated implementation. The multi-view image may be an imagecomprising a plurality of images corresponding to different views of thesame scene. The multi-view image may specifically be a stereo imagecomprising a right and left image (e.g corresponding to a viewpoint forthe right and left eye of a viewer). The high dynamic range first viewimage may specifically be used to generate a prediction (or anadditional prediction) of the high dynamic range second view image. Insome cases, the high dynamic range first view image may be used directlyas a prediction for the high dynamic range second view image. Theapproach may allow for a highly efficient joint/combined encoding ofLDR/HDR multi-view images. The high dynamic range image may specificallybe the high dynamic range first view image.

In accordance with an optional feature of the invention, the highdynamic range first view image and the high dynamic range second viewimage are jointly encoded with the high dynamic range first view imagebeing encoded without being dependent on the high dynamic range secondview image and the high dynamic range second view image being encodedusing data from the high dynamic range first view image, the encodeddata being split into separate data streams including a primary datastream comprising data for the high dynamic range first view image and asecondary bitstream comprising data for the high dynamic range secondview image, wherein the primary and secondary bitstreams are multiplexedinto the output encoded data stream with data for the primary andsecondary data streams being provided with separate codes.

This may provide a particularly efficient encoding of a data stream ofmulti-view images which may allow improved backwards compatibility. Theapproach may combine advantages of joint encoding of multi-view HDRimages with backwards compatibility allowing non-fully capable decodersto efficiently decode single view images.

In accordance with an optional feature of the invention, an encodingmodule comprises an image data input for receiving image data for animage to be encoded, a prediction input for receiving a prediction forthe image to be encoded, and a data output for outputting encoding datafor the image to be encoded, the encoding module being operable togenerate the encoding data from the prediction and the image data; andencoding the high dynamic range first view image is performed by theencoding module when receiving a prediction generated from the mappingon the prediction input and image data for the high dynamic range imageon the image data input, and encoding of the high dynamic range secondview image is performed by the encoding module when receiving aprediction generated from the high dynamic range first view image on theprediction input and image data for the high dynamic range second viewimage on the image data input.

This may allow a particularly efficient and/or low complexity encoding.The encoding module may advantageously be reused for differentfunctionality. The encoding module may for example be an H264 singleview encoding module.

In accordance with an aspect of the invention, there is provided methodof generating a high dynamic range image from a low dynamic range image,the method comprising: receiving the low dynamic range image; providinga mapping relating input data in the form of input sets of image spatialpositions and a combination of color coordinates of low dynamic rangepixel values associated with the image spatial positions to output datain the form of high dynamic range pixel values, the mapping reflecting adynamic range relationship between a reference low dynamic range imageand a corresponding reference high dynamic range image; and generatingthe high dynamic range image in response to the low dynamic range imageand the mapping.

The invention may allow a particularly efficient approach for generatinga high dynamic range image from a low dynamic range image.

The method may specifically be a method of decoding a high dynamic rangeimage. The low dynamic range image may be received as an encoded imagewhich is first decoded after which the mapping is applied to the decodedlow dynamic range image to provide a high dynamic range image.Specifically, the low dynamic range image may be generated by decoding abase layer image of an encoded data stream.

The reference low dynamic range image and a corresponding reference highdynamic range may e.g. be previously decoded images. In someembodiments, the low dynamic range image may be received in an encodeddata stream which may also comprise data characterizing or identifyingthe mapping and/or one or both of the reference images.

In accordance with an optional feature of the invention, generating thehigh dynamic range image comprises determining at least part of apredicted high dynamic range image by for each position of at least partof the predicted dynamic range image: determining at least one matchinginput set matching the each position and a first combination of colorcoordinates of low dynamic range pixel values associated with the eachposition; retrieving from the mapping at least one output high dynamicrange pixel value for the at least one matching input set; determining ahigh dynamic range pixel value for the each position in the predictedhigh dynamic range image in response to the at least one output highdynamic range pixel value; and determining the high dynamic range imagein response to the at least part of the predicted high dynamic rangeimage.

This may provide a particularly advantageous generation of a highdynamic range image. In many embodiments, the approach may allow aparticularly efficient encoding of both low and high dynamic rangeimages. In particular, an accurate, automatically adapting and/orefficient generation of a prediction of a high dynamic range image froma low dynamic range image can be achieved.

The generation of the high dynamic range image in response to the atleast part of the predicted high dynamic range image may comprise usingthe at least part of the predicted high dynamic range image directly ormay e.g. comprise enhancing the at least part of the predicted highdynamic range image using residual high dynamic range data, which e.g.may be comprised in a different layer of an encoded signal than a layercomprising the low dynamic range image.

In accordance with an optional feature of the invention, the low dynamicrange image is an image of a low dynamic range video sequence and themethod comprises generating the mapping using a previous low dynamicrange image of the low dynamic range video sequence as the reference lowdynamic range image and a previous high dynamic range image generatedfor the previous low dynamic range image as the reference high dynamicrange image.

This may allow an efficient operation and may in particular allowefficient encoding of video sequences with corresponding low and highdynamic range images. For example, the approach may allow an accurateencoding based on a prediction of at least part of a high dynamic rangeimage from a low dynamic range image without requiring any informationof the applied mapping to be communicated between the encoder anddecoder.

In accordance with an optional feature of the invention, the previoushigh dynamic range image is further generated in response to residualimage data for the previous low dynamic range image relative topredicted image data for the previous low dynamic range image.

This may provide a particularly accurate mapping and thus improvedprediction.

In accordance with an optional feature of the invention, the low dynamicrange image is an image of a low dynamic range video sequence, and themethod further comprises using a nominal mapping for at least some lowdynamic range images of the low dynamic range video sequence.

This may allow particularly efficient encoding for many images and mayin particular allow an efficient adaptation to different images of avideo sequence. For example, a nominal mapping may be used for imagesfor which no suitable reference images exist, such as e.g. the firstimage following a scene change.

In some embodiments, the dynamic range video sequence may be received aspart of an encoded video signal which further comprises a referencemapping indication for the low dynamic range images for which thereference mapping is used. In some embodiments, the reference mappingindication is indicative of an applied reference mapping selected from apredetermined set of reference mappings. For example, N referencemappings may be predetermined between an encoder and decoder and theencoding may include an indication of which of the reference mappingsshould be used for the specific image by the decoder.

In accordance with an optional feature of the invention, the combinationis indicative of at least one of a texture, gradient, and spatial pixelvalue variation for the image spatial positions.

This may provide a particularly advantageous generation of a highdynamic range image, and may in particular generate more appealing highdynamic range images.

In accordance with an optional feature of the invention, the input setsfor the mapping further comprises depth indications associated withimage spatial positions, and the mapping further reflects a relationshipbetween depth and high dynamic range pixel values.

This may provide an improved mapping and may for example allow themapping to be used to generate an improved prediction of the highdynamic range image. The approach may e.g. allow a reduced data rate fora given quality level. A depth indication may be any suitable indicationof depth in the image including a depth (z direction) value or adisparity value.

In accordance with an optional feature of the invention, the highdynamic range image corresponds to a first view image of a multi-viewimage and the method further comprises: generating a high dynamic rangesecond view image for the multi-view image in response to the highdynamic range image.

The approach may allow a particularly efficient generation/decoding ofmulti-view images and may allow an improved data rate to quality ratioand/or facilitated implementation. The multi-view image may be an imagecomprising a plurality of images corresponding to different views of thesame scene. The multi-view image may specifically be a stereo imagecomprising a right and left image (e.g corresponding to a viewpoint forthe right and left eye of a viewer). The high dynamic range first viewimage may specifically be used to generate a prediction of the highdynamic range second view image. In some cases, the high dynamic rangefirst view image may be used directly as a prediction for the highdynamic range second view image. The approach may allow for a highlyefficient joint/combined decoding of LDR/HDR multi-view images.

In accordance with an optional feature of the invention, a decodingmodule comprises an encoder data input for receiving encoded data for anencoded image, a prediction input for receiving a prediction image forthe encoded image, and a data output for outputting a decoded image, thedecoding module being operable to generate the decoded image from theprediction image and the encoder data; and wherein generating the highdynamic range image is performed by the decoding module when receiving aprediction generated from the mapping on the prediction input andresidual image data for the high dynamic range image on the encoder datainput, and generating the high dynamic range second view image isperformed by the decoding module when receiving a prediction imagegenerated from the high dynamic range image on the prediction input andresidual image data for the high dynamic range second view image on theencoder data input.

This may allow a particularly efficient and/or low complexity decoding.The decoding module may advantageously be reused for differentfunctionality. The decoding module may for example be an H264 singleview decoding module.

In accordance with an optional feature of the invention, the decodingmodule comprises a plurality of prediction image memories arranged tostore prediction images generated from previous decoded images; and thedecoding module overwrites one of the prediction image memories with theprediction image received on the prediction input.

This may allow a particularly efficient implementation and/or operation.

In accordance with an optional feature of the invention, the step ofgenerating the high dynamic range second view image comprises: providinga mapping relating input data in the form of input sets of image spatialpositions and a combination of color coordinates of high dynamic rangepixel values associated with the image spatial positions to output datain the form of high dynamic range pixel values, the mapping reflecting arelationship between a reference high dynamic range image for the firstview and a corresponding reference high dynamic range image for thesecond view; and generating the high dynamic range second view image inresponse to the high dynamic range image and the mapping.

This may provide a particularly advantageous approach to generating thedynamic range second view image based on the high dynamic range firstview image. In particularly, it may allow an accurate mapping orprediction which is based on reference images. The generation of thehigh dynamic range second view image may be based on an automaticgeneration of a mapping and may e.g. be based on a previous high dynamicrange second view image and a previous high dynamic range first viewimage. The approach may e.g. allow the mapping to be generatedindependently at an encoder and decoder side and thus allows efficientencoder/decoder prediction based on the mapping without necessitatingany additional mapping data being communicated from the encoder to thedecoder.

According to an aspect of the invention there is provided a device forencoding an input image, the device comprising: a receiver for receivingthe input image; a mapping generator for generating a mapping relatinginput data in the form of input sets of image spatial positions and acombination of color coordinates of low dynamic range pixel valuesassociated with the image spatial positions to output data in the formof high dynamic range pixel values in response to a reference lowdynamic range image and a corresponding reference high dynamic rangeimage; and an output processor for generating an output encoded datastream by encoding the input image in response to the mapping. Thedevice may for example be an integrated circuit or part thereof.

According to an aspect of the invention there is provided an apparatuscomprising: the device of the previous paragraph; input connection meansfor receiving a signal comprising the input image and feeding it to thedevice; and output connection means for outputting the output encodeddata stream from the device. Such an apparatus may e.g. be an encodingunit connected to a color grading device, or a part of a camera, or adevice for creating stored copies of content, etc.

According to an aspect of the invention there is provided a device forgenerating a high dynamic range image from a low dynamic range image,the method comprising: a receiver for receiving the low dynamic rangeimage; a mapping processor for providing a mapping relating input datain the form of input sets of image spatial positions and a combinationof color coordinates of low dynamic range pixel values associated withthe image spatial positions to output data in the form of high dynamicrange pixel values, the mapping reflecting a dynamic range relationshipbetween a reference low dynamic range image and a correspondingreference high dynamic range image; and an image generator forgenerating the high dynamic range image in response to the low dynamicrange image and the mapping. The device may for example be an integratedcircuit or part thereof.

According to an aspect of the invention there is provided an apparatuscomprising the device of the previous paragraph; input connection meansfor receiving the low dynamic range image and feeding it to the device;output connection means for outputting a signal comprising the highdynamic range image from the device. The apparatus may for example be aset-top box, a television, a computer monitor or other display, a mediaplayer, a DVD or BluRay™ player etc.

According to an aspect of the invention there is provided an encodedsignal comprising: an encoded low dynamic range image; and residualimage data for the low dynamic range image, at least part of theresidual image data being indicative of a difference between a desiredhigh dynamic range image corresponding to the low dynamic range imageand a predicted high dynamic range image resulting from application of amapping to the encoded low dynamic range image, where the mappingrelates input data in the form of input sets of image spatial positionsand a combination of color coordinates of low dynamic range pixel valuesassociated with the image spatial positions to output data in the formof high dynamic range pixel values, the mapping reflecting a dynamicrange relationship between a reference low dynamic range image and acorresponding reference high dynamic range image.

According to a feature of the invention there is provided a storagemedium comprising the encoded signal of the previous paragraph. Thestorage medium may for example be a data carrier such as a DVD orBluRay™ disc.

A computer program product for executing the method of any of theaspects or features of the invention may be provided. Also, storagemedium comprising executable code for executing the method of any of theaspects or features of the invention may be provided.

These and other aspects, features and advantages of the invention willbe apparent from and elucidated with reference to the embodiment(s)described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be described, by way of example only,with reference to the drawings, in which

FIG. 1 is an illustration of an example of a transmission system inaccordance with some embodiments of the invention;

FIG. 2 is an illustration of an example of an encoder in accordance withsome embodiments of the invention;

FIG. 3 is an illustration of an example of a method of encoding inaccordance with some embodiments of the invention;

FIGS. 4 and 5 are illustrations of examples of mappings in accordancewith some embodiments of the invention;

FIG. 6 is an illustration of an example of an encoder in accordance withsome embodiments of the invention;

FIG. 7 is an illustration of an example of an encoder in accordance withsome embodiments of the invention;

FIG. 8 is an illustration of an example of a method of decoding inaccordance with some embodiments of the invention;

FIG. 9 is an illustration of an example of a prediction of a highdynamic range image in accordance with some embodiments of theinvention;

FIG. 10 illustrates an example of a mapping in accordance with someembodiments of the invention;

FIG. 11 is an illustration of an example of a decoder in accordance withsome embodiments of the invention;

FIG. 12 is an illustration of an example of an encoder in accordancewith some embodiments of the invention;

FIG. 13 is an illustration of an example of a basic encoding module thatmay be used in encoders in accordance with some embodiments of theinvention;

FIG. 14-17 illustrates examples of encoders using the basic encodingmodule of FIG. 13;

FIG. 18 illustrates an example of a multiplexing of data streams;

FIG. 19 is an illustration of an example of a basic decoding module thatmay be used in decoders in accordance with some embodiments of theinvention; and

FIG. 20-22 illustrates examples of decoders using the basic decodingmodule of FIG. 18.

DETAILED DESCRIPTION OF SOME EMBODIMENTS OF THE INVENTION

The following description focuses on embodiments of the inventionapplicable to encoding and decoding of corresponding low dynamic rangeand high dynamic range images of video sequences. However, it will beappreciated that the invention is not limited to this application andthat the described principles may be applied in many other scenarios andmay e.g. be applied to enhance or modify dynamic ranges of a largevariety of images.

FIG. 1 illustrates a transmission system 100 for communication of avideo signal in accordance with some embodiments of the invention. Thetransmission system 100 comprises a transmitter 101 which is coupled toa receiver 103 through a network 105 which specifically may be theInternet or e.g. a broadcast system such as a digital televisionbroadcast system.

In the specific example, the receiver 103 is a signal player device butit will be appreciated that in other embodiments the receiver may beused in other applications and for other purposes. In the particularexample, the receiver 103 may be a display, such as a television, or maybe a set top box for generating a display output signal for an externaldisplay such as a computer monitor or a television.

In the specific example, the transmitter 101 comprises a signal source107 which provides a video sequence of low dynamic range images and acorresponding video sequence of high dynamic range images. Correspondingimages represent the same scene/image but with different dynamic ranges.Typically, the low dynamic range image may be generated from thecorresponding high dynamic range image by a suitable color grading thatmay have been performed automatically, semi-automatically or manually.In some embodiments, the high dynamic range image may be generated fromthe low dynamic range image, or they may be generated in parallel, suchas e.g. for computer generated images.

It will be appreciated that the term low dynamic range image and highdynamic range image do not specify any specific absolute dynamic rangesfor the images but are merely relative terms that relate the images toeach other such that a high dynamic range image has a (potentially)higher dynamic range than the lower dynamic range image.

The signal source 107 may itself generate the low dynamic range image,the high dynamic range image or both the low and high dynamic rangeimages or may e.g. receive one or both of these from an external source.

The signal source 107 is coupled the encoder 109 which proceeds toencode the high and low dynamic range video sequences in accordance withan encoding algorithm that will be described in detail later. Theencoder 109 is coupled to a network transmitter 111 which receives theencoded signal and interfaces to the communication network 105. Thenetwork transmitter may transmit the encoded signal to the receiver 103through the communication network 105. It will be appreciated that inmany other embodiments, other distribution or communication networks maybe used, such as e.g. a terrestrial or satellite broadcast system.

The receiver 103 comprises a receiver 113 which interfaces to thecommunication network 105 and which receives the encoded signal from thetransmitter 101. In some embodiments, the receiver 113 may for examplebe an Internet interface, or a wireless or satellite receiver.

The receiver 113 is coupled to a decoder 115. The decoder 115 is fed thereceived encoded signal and it then proceeds to decode it in accordancewith a decoding algorithm that will be described in detail later. Thedecoder 115 may specifically generate a high dynamic range videosequence from the received encoded data.

In the specific example where a signal playing function is supported,the receiver 103 further comprises a signal player 117 which receivesthe decoded video signal from the decoder 115 and presents this to theuser using suitable functionality. Specifically, the signal player 117may itself comprise a display that can present the encoded videosequence. Alternatively or additionally, the signal player 117 maycomprise an output circuit that can generate a suitable drive signal foran external display apparatus. Thus, the receiver 103 may comprise aninput connection means receiving the encoded video sequence and anoutput connection means providing an output drive signal for a display.

FIG. 2 illustrates an example of the encoder 109 in accordance with someembodiments of the invention. FIG. 3 illustrates an example of a methodof encoding in accordance with some embodiments of the invention.

The encoder comprises a receiver 201 for receiving a video sequence ofthe low dynamic range images, henceforth referred to as the LDR images,and a receiver 203 for receiving a corresponding video sequence of highdynamic range images, henceforth referred to as the HDR images.

Initially the encoder 109 performs step 301 wherein an input LDR imageof the LDR video sequence is received. The LDR images are fed to an LDRencoder 205 which encodes the video images from the LDR video sequence.It will be appreciated that any suitable video or image encodingalgorithm may be used and that the encoding may specifically includemotion compensation, quantization, transform conversion etc as will beknown to the skilled person. Specifically, the LDR encoder 205 may be aH-264/AVC standard encoder.

Thus, step 301 is followed by step 303 wherein the input LDR image isencoded to generate an encoded LDR image.

The encoder 109 then proceeds to generate a predicted HDR image from theLDR image. The prediction is based on a prediction base image which mayfor example be the input LDR image itself. However, in many embodimentsthe prediction base image may be generated to correspond to the LDRimage that can be generated by the decoder by decoding the encoded LDRimage.

In the example of FIG. 2, the LDR encoder 205 is accordingly coupled toan LDR decoder 207 which proceeds to generate the prediction base imageby a decoding of encoded data of the LDR image. The decoding may be ofthe actual output data stream or may be of an intermediate data stream,such as e.g. of the encoded data stream prior to a final non-lossyentropy coding. Thus, the LDR decoder 207 performs step 305 wherein theprediction base image is generated by decoding the encoded LDR image.

The LDR decoder 207 is coupled to a predictor 209 which proceeds togenerate a predicted HDR image from the prediction base image. Theprediction is based on a mapping provided by a mapping processor 211.

Thus, in the example, step 305 is followed by step 307 wherein themapping is generated and subsequently step 309 wherein the prediction isperformed to generate the predicted HDR image.

The predictor 209 is further coupled to an HDR encoder 213 which isfurther coupled to the HDR receiver 203. The HDR encoder 213 receivesthe input HDR image and the predicted HDR image and proceeds to encodethe input HDR image based on the predicted HDR image.

As a specific low complexity example, the encoding of the HDR image maybe based on generating a residual HDR image relative to the predictedHDR image and encoding the residual HDR image. Thus, in such an example,the HDR encoder 213 may proceed to perform step 311 wherein a residualHDR image is generated in response to a comparison between the input HDRimage and the predicted HDR image. Specifically, the HDR encoder 213 maygenerate the residual HDR image by subtracting the predicted HDR imagefrom the input HDR image. Thus, the residual HDR image represents theerror between the input HDR image and that which is predicted based onthe corresponding (encoded) LDR image. In other embodiments, othercomparisons may be made. For example, a division of the HDR image by thepredicted HDR image may be employed.

The HDR encoder 213 may then perform step 313 wherein the residual imageis encoded to generate encoded residual data.

It will be appreciated that any suitable encoding principle or algorithmfor encoding the residual image may be used. Indeed, in many embodimentsthe predicted HDR image may be used as one possible prediction out ofseveral. Thus, in some embodiments the HDR encoder 213 may be arrangedto select between a plurality of predictions including the predicted HDRimage. Other predictions may include spatial or temporal predictions.The selection may be based on an accuracy measure for the differentpredictions, such as on an amount of residual relative to the HDP inputimage. The selection may be performed for the whole image or may forexample be performed individually for different areas or regions of theHDR image.

For example, the HDR encoder may be an H264 encoder. A conventional H264encoder may utilize different predictions such as a temporal predication(between frames, e.g. motion compensation) or spatial prediction (i.e.predicting one area of the image from another). In the approach of FIG.2, such predictions may be supplemented by the LDR to HDR imageprediction. The H.264 based encoder then proceeds to select between thedifferent possible predictions. This selection is performed on amacroblock basis and is based on selecting the prediction that resultsin the lowest residual for that macroblock. Specifically, a ratedistortion analysis may be performed to select the best predictionapproaches for each macroblock. Thus, a local decision is made.

Accordingly, the H264 based encoder may use different predictionapproaches for different macroblocks. For each macroblock the residualdata may be generated and encoded. Thus, the encoded data for the inputHDR image may comprise residual data for each macroblock resulting fromthe specific selected prediction for that macroblock. In addition, theencoded data may comprise an indication of which prediction approach isused for each individual macroblock.

Thus, the LDR to HDR prediction may provide an additional possibleprediction that can be selected by the encoder. For some macroblocks,this prediction may result in a lower residual than other predictionsand accordingly it will be selected for this macroblock. The resultingresidual image for that block will then represent the difference betweenthe input HDR image and the predicted HDR image for that block.

The encoder may in the example use a selection between the differentprediction approaches rather than a combination of these, since thiswould result in the different predictions typically interfering witheach other.

The LDR encoder 205 and the HDR encoder 213 are coupled to an outputprocessor 215 which receives the encoded LDR data and the encodedresidual data. The output processor 215 then proceeds to perform step315 wherein an output encoded data stream is generated to include theencoded LDR data and the encoded residual data.

In the example, the generated output encoded data stream is a layereddata stream and the encoded LDR data is included in a first layer withthe encoded residual data being included in a second layer. The secondlayer may specifically be an optional layer that can be discarded bydecoders or devices that are not compatible with the HDR processing.Thus, the first layer may be a base layer with the second layer being anoptional layer, and specifically the second layer may be an enhancementor optional dynamic range modification layer. Such an approach may allowbackwards compatibility while allowing HDR capable equipment to utilizethe additional HDR information. Furthermore, the use of prediction andresidual image encoding allows a highly efficient encoding with a lowdata rate for a given image quality.

In the example of FIG. 2, the prediction of the HDR image is based on amapping. The mapping is arranged to map from input data in the form ofinput sets of image spatial positions and a combination of colorcoordinates of low dynamic range pixel values associated with the imagespatial positions to output data in the form of high dynamic range pixelvalues.

Thus a mapping, which specifically may be implemented as alook-up-table, is based on input data which is defined by a number ofparameters organized in input sets. Thus, the input sets may beconsidered to be multi-dimensional sets that comprise values for anumber of parameters. The parameters include spatial dimensions andspecifically may comprise a two dimensional image position, such as e.g.a parameter (range) for a horizontal dimension and a parameter (range)for a vertical dimension. Specifically, the mapping may divide the imagearea into a plurality of spatial blocks with a given horizontal andvertical extension.

For each spatial block, the mapping may then comprise one or moreparameters generated from color coordinates of low dynamic range pixelvalues. As a simple example, each input set may include a singleluminance value in addition to the spatial parameters. Thus, in thiscase each input set is a three dimensional set with two spatial and oneluminance parameters.

For the various possible input sets, the mapping provides an output highdynamic range pixel value. Thus, the mapping may in the specific examplebe a mapping from three dimensional input data to a single high dynamicrange pixel value.

The mapping thus provides both a spatial and color component (includinga luminance only component) dependent mapping to a suitable high dynamicrange pixel value.

The mapping processor 211 is arranged to generate the mapping inresponse to a reference low dynamic range image and a correspondingreference high dynamic range image. Thus, the mapping is not apredetermined or fixed mapping but is rather a mapping that may beautomatically and flexibly generated/updated based on reference images.

The reference images may specifically be images from the videosequences. Thus, the mapping is dynamically generated from images of thevideo sequence thereby providing an automated adaptation of the mappingto the specific images.

As a specific example, the mapping may be based on the actual LDR andcorresponding HDR image that are being encoded. In this example, themapping may be generated to reflect a spatial and color componentrelationship between the input LDR and the input HDR images.

As a specific example, the mapping may be generated as a threedimensional grid of NX×NY×NI bins (input sets). Such a grid approachprovides a lot of flexibility in terms of the degree of quantizationapplied to the three dimensions. In the example, the third (non-spatial)dimension is an intensity parameter which simply corresponds to aluminance value. In the examples below, the prediction of the HDR imageis performed at macro-block level and with 2⁸ intensity bins (i.e. using8 bit values). For a High Definition image this means that the grid hasdimensions of: 120×68×256 bins. Each of the bins corresponds to an inputset for the mapping.

For each LDR input pixel at position (x,y) in the reference images andintensities V_(LDR) and V_(HDR) for the LDR and HDR image respectivelyfor the color component under consideration (e.g. if each colourcomponent is considered separately), the matching bin for position andintensity is first identified.

In the example, each bin corresponds to a spatial horizontal interval, aspatial vertical interval and an intensity interval. The matching bin(i.e. input set) may be determined by means of nearest neighborinterpolation:

I _(x) =[x/s _(x)],

I _(y) =[y/s _(y)],

I _(I) =[V _(LDR) /s _(I)],

where I_(x), I_(y) and I_(I) are the grid coordinates in the horizontal,vertical and intensity directions, respectively, s_(x), s_(y) and s_(I)are the grid spacings (interval lengths) along these dimensions and [ ]denotes the closest integer operator.

Thus, in the example the mapping processor 211 determines a matchinginput set/bin that has spatial intervals corresponding to the each imageposition for the pixel and an interval of the intensity value intervalthat corresponds to the intensity value for the pixel in the referencelow dynamic range image at the specific position.

The mapping processor 211 then proceeds to determine an output highdynamic range pixel value for the matching input set/bin in response toa high dynamic range pixel value for the position in the reference HDRimage.

Specifically, during the construction of the grid, both an intensityvalue D and a weight value W are updated for each new positionconsidered:

D(I _(x) ,I _(y) ,I _(I))=D(I _(x) ,I _(y) ,I _(I))+V _(HDR)(x,y),

W(I _(x) ,I _(y) ,I _(I))=W(I _(x) ,I _(y) ,I _(I))+1.

After all pixels of the images have been evaluated, the intensity valueis normalized by the weight value to result in the output HDR value Bfor the bin:

B=D/W,

where the data value B for each value contains an output HDR pixel valuecorresponding to the position and input intensity for the specificbin/input set. Thus, the position within the grid is determined by thereference LDR image whereas the data stored in the grid corresponds tothe reference HDR image. Thus, the mapping input sets are determinedfrom the reference LDR image and the mapping output data is determinedfrom the reference HDR image. In the specific example, the stored outputHDR value is an average of the HDR value of pixels falling within theinput set/bin but it will be appreciated that in other embodiments,other and in particular more advanced approaches may be used.

In the example, the mapping is automatically generated to reflect thespatial and pixel value relationships between the reference LDR and HDRimages. This is particularly useful for prediction of the HDR image fromthe LDR image when the reference images are closely correlated with theLDR and HDR images being encoded. This may particularly be the case ifthe reference images are indeed the same images as those being encoded.In this case, a mapping is generated which automatically adapts to thespecific relationships between the input LDR and HDR images. Thus,whereas the relationship between these images typically cannot be knownin advance, the described approach automatically adapts to therelationship without any prior information. This allows an accurateprediction which results in fewer differences relative to the input HDRimage, and thus in a residual image that can be encoded moreeffectively.

In embodiments where the input images being encoded are directly used togenerate the mapping, these images will generally not be available atthe decoder end. Therefore, the decoder cannot generate the mapping byitself. Accordingly, in some embodiments, the encoder may further bearranged to include data that characterizes at least part of the mappingin the output encoded stream. For example, in scenarios where fixed andpredetermined input set intervals (i.e. fixed bins) are used, theencoder may include all the bin output values in the output encodedstream, e.g. as part of the optional layer. Although this may increasethe data rate, it is likely to be a relatively low overhead due to thesubsampling performed when generating the grid. Thus, the data reductionachieved from using an accurate and adaptive prediction approach islikely to outweigh any increase in the data rate resulting from thecommunication of the mapping data.

When generating the predicted image, the predictor 209 may proceed tostep through the image one pixel at a time. For each pixel, the spatialposition and the intensity value for the pixel in the LDR image is usedto identify a specific input set/bin for the mapping. Thus, for eachpixel, a bin is selected based on the spatial position and the LDR imagevalue for the pixel. The output HDR pixel value for this input set/binis then retrieved and may in some embodiments be used directly as theimage value for the pixel. However, as this will tend to provide acertain blockiness due to the spatial subsampling of the mapping, thehigh dynamic range pixel value will in many embodiments be generated byinterpolation between output high dynamic range pixel values from aplurality of input bins. For example, the values from neighboring bins(in both the spatial and non-spatial directions) may also be extractedand the pixel value may be generated as an interpolation of these.

Specifically, the predicted HDR image can be constructed by slicing inthe grid at the fractional positions dictated by the spatial coordinatesand the LDR image:

V _(HDR) =F _(int)(B(x/s _(x) ,y/s _(y) ,I/s _(I))),

where F_(int) denotes an appropriate interpolation operator, such asnearest neighbor or bicubic interpolation.

In many scenarios the images may be represented by a plurality of colorcomponents (e.g. RGB or YUV) and the described process may be appliedseparately of each of the color channels. In particular, the output highdynamic range pixel values may contain one value for each of the colorcomponents.

Examples of generation of a mapping are provided in FIGS. 4 and 5. Inthe examples, the LDR-HDR mapping relation is established using LDR andHDR training images and the position in the mapping table is determinedby the horizontal (x) and vertical (y) pixel positions in the image aswell as by a combination of LDR pixel values, such as the luminance (Y)in the example of FIG. 4 and the entropy (E) in the example of FIG. 5.As previously described the mapping table stores the associated HDRtraining data at the specified location.

The encoder 115 thus generates an encoded signal which comprises theencoded low dynamic range image. This image may specifically be includedin a mandatory or base layer of the encoded bitstream. In addition, datais included that allows an efficient generation of an HDR image at thedecoder based on the encoded LDR image.

In some embodiments, such data may include or be in the form of mappingdata that can be used by the decoder. However, in other embodiments, nosuch mapping data is included for some or all of the images. Instead,the decoder may itself generate the mapping data from previous images.

The generated encoded signal may further comprise residual image datafor the low dynamic range image where the residual image data isindicative of a difference between a desired high dynamic range imagecorresponding to the low dynamic range image and a predicted highdynamic range image resulting from application of the mapping to theencoded low dynamic range. The desired high dynamic range image isspecifically the input HDR image, and thus the residual image datarepresents data that can modify the decoder generated HDR image to moreclosely correspond to the desired HDR image, i.e. to the correspondinginput HDR image.

The additional residual image data may in many embodimentsadvantageously be included in an optional layer (e.g. an enhancementlayer) that may be used by suitably equipped decoders and ignored bylegacy decoders that do not have the required functionality.

The approach may for example allow the described mapping basedprediction to be integrated in new backwards-compatible HDR videoformats. For example, both layers may be encoded using conventionaloperations of data transformations (e.g. wavelet, DCT) followed byquantization. Intra- and motion-compensated inter-frame predictions canimprove the coding efficiency. In such an approach, inter-layerprediction from LDR to HDR complements the other predictions and furtherimproves the coding efficiency of the enhancement layer.

The signal may specifically be a bit stream that may be distributed orcommunicated, e.g. over a network as in the example of FIG. 1. In somescenarios, the signal may be stored on a suitable storage medium such asa magneto/optical disc. E.g. the signal may be stored on a DVD orBluray™ disc.

In the previous example, information of the mapping was included in theoutput bit stream thereby enabling the decoder to reproduce theprediction based on the received image. In this and other cases, it maybe particularly advantageous to use a subsampling of the mapping.

Indeed, a spatial subsampling may advantageously be used such that aseparate output value is not stored for each individual pixel but ratheris stored for groups of pixels and in particular regions of pixels. Inthe specific example a separate output value is stored for eachmacro-block.

Alternatively or additionally, a subsambling of the input non-spatialdimensions may be used. In the specific example, each input set maycover a plurality of possible intensity values in the LDR images therebyreducing the number of possible bins. Such a subsampling may correspondto applying a coarser quantization prior to the generation of themapping.

Such spatial or value subsampling may substantially reduce the data raterequired to communicate the mapping. However, additionally oralternatively it may substantially reduce the resource requirements forthe encoder (and corresponding decoder). For example, it maysubstantially reduce the memory resource required to store the mappings.It may also in many embodiments reduce the processing resource requiredto generate the mapping.

In the example, the generation of the mapping was based on the currentimages, i.e. on the LDR and corresponding HDR image being encoded.However, in other embodiments the mapping may be generated using the aprevious image of the low dynamic range video sequence as the referencelow dynamic range image and a previous high dynamic range imagegenerated for the previous low dynamic range video sequence as thereference high dynamic range image (or in some cases the correspondingprevious input HDR image). Thus, in some embodiments, the mapping usedfor the current image may be based on previous corresponding LDR and HDRimages.

As an example the video sequence may comprise a sequence of images ofthe same scene and accordingly the differences between consecutiveimages is likely to be low. Therefore, the mapping that is appropriatefor one image is highly likely to also be appropriate for the subsequentimage. Therefore, a mapping generated using the previous LDR and HDRimages as reference images is highly likely to also be applicable to thecurrent image. An advantage of using a mapping for the current imagebased on a previous image is that the mapping can be independentlygenerated by the decoder as this also has the previous images available(via the decoding of these). Accordingly, no information on the mappingneeds to be included, and therefore the data rate of the encoded outputstream can be reduced further.

A specific example of an encoder using such an approach is illustratedin FIG. 6. In this example, the mapping (which in the specific exampleis a Look Up Table, LUT) is constructed on the basis of the previous(delay τ) reconstructed LDR and the previous reconstructed (delay τ) HDRframe both on the encoder and decoder side. In this scenario no mappingvalues need to be transmitted from the encoder to the decoder. Rather,the decoder merely copies the HDR prediction process using data that isalready available to it. Although the quality of the interlayerprediction may be slightly degraded, this will typically be minorbecause of the high temporal correlation between subsequent frames of avideo sequence. In the example, a yuv420 color scheme is used for LDRimages and a yuv 444/422 color scheme is used for HDR images (andconsequently the generation and application of the LUT (mapping) ispreceded by a color up-conversion).

It is preferred to keep the delay τ as small as possible in order toincrease the likelihood that the images are as similar as possible.However, the minimum value may in many embodiments depend on thespecific encoding structure used as it requires the decoder to be ableto generate the mapping from already decoded pictures. Therefore, theoptimal delay may depend on the type of GOP (Group Of Pictures) used andspecifically on the temporal prediction (motion compensation) used Forexample for a IPPPP GOP, τ can be a single image delay whereas it from aIBPBP GOP will be at least two images.

In the example, each position of the LDR contributed to only one inputset/bin of the grid. However, in other embodiments the mapping processormay identify a plurality of matching input sets for at least oneposition of the at least a group of image positions used to generate themapping. The output high dynamic range pixel value for all the matchinginput sets may then be determined in response to the high dynamic rangepixel value for the position in the reference high dynamic range image.

Specifically, rather the using nearest neighbor interpolation to buildthe grid, the individual data can also be spread over neighboring binsrather than just the single best matching bin. In this case, each pixeldoes not contribute to a single bin but contributes to e.g. all itsneighboring bins (8 in the case of a 3D grid). The contribution may e.g.be inversely proportional to the three dimensional distance between thepixel and the neighboring bin centers.

FIG. 7 illustrates an example of a complementary decoder 115 to theencoder of FIG. 2 and FIG. 8 illustrates an example of a method ofoperation therefor. The decoder 115 comprises a receive circuit 701which performs step 801 wherein it receives the encoded data from thereceiver 113. In the specific example where LDR encoded data andresidual data is encoded in different layers, the receive circuit isarranged to extract and demultiplex the LDR encoded data and theoptional layer data in the form of the residual image data. Inembodiments wherein the information on the mapping is included in thereceived bitstream, the receive circuit 701 may further extract thisdata.

The receiver circuit 701 is coupled to an LDR decoder 703 which receivesthe encoded LDR data. It then proceeds to perform step 803 wherein theLDR image is decoded. He LDR decoder 703 will be complementary to theLDR encoder 205 of the encoder 109 and may specifically be an H-264/AVCstandard decoder.

The LDR decoder 703 is coupled to a decode predictor 705 which receivesthe decoded LDR image. The decode predictor 705 is further coupled to adecode mapping processor 707 which is arranged to perform step 805wherein a mapping is generated for the decode predictor 705.

The decode mapping processor 707 generates the mapping to correspond tothat used by the encoder when generating the residual image data. Insome embodiments, the decode mapping processor 707 may simply generatethe mapping in response to mapping data received in the encoded datastream. For example, the output data value for each bin of the grid maybe provided in the received encoded data stream.

The decode predictor 705 then proceeds to perform step 807 wherein apredicted HDR image is generated from the decoded LDR image and themapping generated by the decode mapping processor 707. The predictionmay follow the same approach as that used in the encoder.

For brevity and clarity, the example will focus on the simplifiedexample wherein the encoder is based only on the LDR to HDR prediction,and thus where an entire LDR to HDR prediction image (and thus an entireresidual image) is generated. However, it will be appreciated that inother embodiments, the approach may be used with other predictionapproaches, such as temporal or spatial predictions. In particular, itwill be appreciated that rather than apply the described approach to thewhole image, it may be applied only to image regions or blocks whereinthe LDR to HDR prediction was selected by the encoder.

FIG. 9 illustrates a specific example of how a prediction operation maybe performed.

In step 901 a first pixel position in the HDR image is selected. Forthis pixel position an input set for the mapping is then determined instep 903, i.e. a suitable input bin in the grid is determined. This mayfor example be determined by identifying the grid covering the spatialinterval in which the position falls and the intensity interval in whichthe decoded pixel value of the decoded LDR image falls. Step 903 is thenfollowed by step 905 wherein an output value for the input set isretrieved from the mapping. E.g. a LUT may be addressed using thedetermined input set data and the resulting output data stored for thataddressing is retrieved.

Step 905 is then followed by step 907 wherein the pixel value for thepixel is determined from the retrieved output. As a simple example, thepixel value may be set to the retrieved value. In more complexembodiments, the pixel value may be generated by interpolation of aplurality of output values for different input sets (e.g. consideringall neighbor bins as well as the matching bin).

This process may be repeated for all positions in the HDR image and forall color components thereby resulting in a predicted HDR image beinggenerated.

The decoder 115 then proceeds to generate an output HDR image based onthe predicted HDR image.

In the specific example, the output HDR image is generated by taking thereceived residual image data into account. Thus the receive circuit 701is coupled to a residual decoder 709 which receives the residual imagedata and which proceeds to perform step 809 wherein the residual imagedata is decoded to generate a decoded residual image.

The residual decoder 709 is coupled to a combiner 711 which is furthercoupled to the decode predictor 705. The combiner 711 receives thepredicted HDR image and the decoded residual HDR image and proceeds toperform step 811 wherein it combines the two images to generate theoutput HDR image. Specifically, the combiner may add pixel values forthe two images on a pixel by pixel basis to generate the output HDRimage.

The combiner 711 is coupled to an output circuit 713 which performs step813 in which an output signal is generated. The output signal may forexample be a display drive signal which can drive a suitable display,such as a television, to present the HDR image. In the specific example,the mapping was determined on the basis of data included in the encodeddata stream. However, in other embodiments, the mapping may be generatedin response to previous images that have been received by the decoder,such as e.g. the previous image of the video sequence. For this previousimage, the decoder will have an LDR image resulting from the LDRdecoding and this may be used as the reference LDR image. In addition,an HDR image has been generated by prediction followed by furthercorrection of the predicted image using the residual image data. Thus,the generated HDR image closely corresponds to the input HDR image ofthe encoder and may accordingly be used as the reference HDR image.Based on these two reference images, the exact same approach as thatused by the encoder may be used to generate a mapping by the decoder.Accordingly, this mapping will correspond to that used by the encoderand will thus result in the same prediction (and thus the residual imagedata will accurately reflect the difference between the decoderpredicted image and the input HDR image at the encoder).

The approach thus provides a backwards compatible HDR encoding startingfrom a standard LDR encoding, which may e.g. use a “non-optimal”subrange selection of all luminances available in the scene for optimalcontrast, via an LDR tone mapping (e.g. a quick rising S-curve withblack and white clipping). The approach then adds additional data toallow reconstruction of the optimally encoded HDR image (withpotentially another tone mapping for better quality visual effect: e.g.dark grays may be pushed deeper than in the LDR coding).

This may e.g. result in the following differences between HDR and LDR:

-   -   higher precision for the same values (e.g. L=27.5 instead of        27), which could also be recoded with a scale and offset (e.g.        55=2×27.5+0)    -   encoding of white and black subpictures that have been lost in        the clipping    -   shifting of at least some grays in the image (e.g. darken the        18% grays) to give a better visual rendering on a typical higher        peak brightness display.

The approach uses a prediction of this HDR signal from the available LDRdata, so that the required residual information is reduced.

The approach uses an improved characterization of the mapping fromdifferent LDR values to HDR values automatically taking into accountthings that happen to all underlying object colors (e.g. a part of atext character in the block overlapping several objects etc.).

The described example ignores the actual per-pixel fine accuracy spatialprofile, but using the “local average” our “all-colors-adaptive”approach will typically result in better prediction (e.g. on either sideof edges by using the input LDR value as a rough index to look up thecorresponding bin which then yields the approximate HDR value needed).This results in a good object-in-HDR average starting value for any suchobject possibly present, thus needing lesser residue.

Specifically, a mapping grid is constructed, e.g. subsampled in space(since only the local averages are used and not the exact geometric HDRmicroprofile), and with an HDR value for each possible LDR value (orcombination of color coordinates). In some embodiments a valuesubsampling may also be performed e.g. with an HDR value per step of 4luminance codings of the LDR.

The described approach may provide a particularly efficient adaptationof the mapping to the specific local characteristics and may in manyscenarios provide a particularly accurate prediction. This may beillustrated by the example of FIG. 10 which illustrates relationshipsbetween the luminance for the LDR image Y_LDR and the luminance forcorresponding HDR image Y_HDR. FIG. 10 illustrates the relationship fora specific macro-block which happens to include elements of threedifferent objects. As a consequence the pixel luminance relations(indicated by dots) are located in three different clusters 1001, 1003,1005.

The algorithms of the prior art will perform a linear regression on therelationship thereby generating a linear relationship between the LDRluminance values and the HDR luminance values, such as e.g. the oneindicated by the line 1007. However, such an approach will providerelatively poor mapping/prediction for at least some of the values, suchas those belonging to the image object of cluster 1003.

In contrast, the approach described above will generate a much moreaccurate mapping such as the one indicated by line 1009. This mappingwill much more accurately reflect the characteristics and suitablemapping for all of the clusters and will thus result in an improvedmapping. Since the mapping, e.g. as realized as a lookup table, can haveany non-linear shape (e.g. by mere selection of the particular spatiallysubsampled positions, which need not fall on a regular grid), one maysummarize a complex local image structure at the sending side, e.g. bymeans of image analysis techniques such as clustering, segmentation,edge detection, etc. Embodiments which send no such mapping to the otherside (even if only during some critical shots only), may not have suchfull flexibility if the mappings are to be estimated at the receivingside, but could still allow some flexibility, especially if the lookuptables are additionally partially corrected with upgrading data (e.g.the receiving end mapping may be generated from a spatially regulargrid, however, may be supplemented or corrected for a couple ofdifficult to estimate clusters, etc.). Indeed, the mapping may not onlyprovide accurate results for luminances corresponding to the clustersbut can also accurately predict relationships for luminances inbetween,such as for the interval indicated by 1011. Such mappings can beobtained by interpolation. This may be useful when e.g. postprocessingan image.

Furthermore, such accurate mapping information can be determinedautomatically by simple processing based on reference images (and in thespecific case based on two reference macro blocks). E.g., metadata mayfurther specify which (parts of) reference images are morecompression-economical to use (e.g. a certain object in a particularimage may be well-graded and form a good basis for further predictionand correction). In addition, the accurate mapping can be determinedindependently by an encoder and a decoder based on previous images andthus no information of the mapping needs to be included in the datastream. Thus, overhead of the mapping may be minimized.

In the previous example, the approach was used as part of a decoder foran HDR image. However, it will be appreciated that the principles may beused in many other applications and scenarios. For example, the approachmay be used to simply generate an HDR image from an LDR image. Forexample, suitable local reference images may be selected locally andused to generate a suitable mapping. The mapping may then be applied tothe LDR image to generate an HDR image (e.g. using interpolation). Theresulting HDR image may then be displayed on an HDR display.

Also, it will be appreciated that the decoder in some embodiments maynot consider any residual data (and thus that the encoder need notgenerate the residual data). Indeed, in many embodiments the HDR imagegenerated by applying the mapping to the decoded LDR image may be useddirectly as the output HDR image without requiring any furthermodification or enhancement.

The described approach may be used in many different applications andscenarios and may for example be used to dynamically generate real-timeHDR video signals from LDR video signals. For example, the decoder 115may be implemented in a set-top box or other apparatus having an inputconnector receiving the video signal and an output connector outputtingan HDR video signal that can be displayed on a suitable high dynamicrange display.

As a specific example, a video signal as described may be stored on aBluray™ disc which is read by a Bluray™ player. The Bluray™ player maybe connected to the set-top box via an HDMI cable and the set-top boxmay then generate the HDR image. The set-top box may be connected to adisplay (such as a television) via another HDMI connector.

In some scenarios, the decoder or HDR image generation functionality maybe included as part of a signal source, such as a Bluray™ player orother media player. As another alternative, the functionality may beimplemented as part of a display, such as a computer monitor ortelevision. Thus, the display may receive an LDR stream that can bemodified to provide LDR images. Hence, a signal source, such as a mediaplayer, or a display, such as a computer monitor or television, whichdelivers a significantly improved user experience can be provided.

The described approach may be applied to each individual color channelfor an image. For example, for an RGB image, the approach may beindividually applied to each of the R, G and B channels. However, insome embodiments, the combination value used for the mapping input maybe a luminance value whereas the output data may be an individual colorcomponent value. For example, the RGB value for a given pixel may becombined into a single luminance value whereas individual HDR outputpixel values are stored in the grid for each individual color channel.

Indeed, in practice, the LDR images are often generated from HDR imagesby means of unknown tone-mapping and color grading operations. Theinventors have realized that the relationship between the individualcolor components for the LDR and HDR images may often be betterpredicted from the LDR luminance information rather than from the LDRcolor data. Therefore, in many embodiments, it is beneficial to use theluminance of the LDR signal for the intensity coordinates even whenconstructing the grid for color components, such as U and V. In otherwords, V_(LDR) in the previous equation may be set to the luminancevalue Y_(LDR) for all color components. Thus, the same grid may be usedfor all color channels with each bin storing an output HDR value foreach color channel.

In the specific described examples, the input data for the mappingsimply consisted in two spatial dimensions and a single pixel valuedimension representing an intensity value that may e.g. correspond to aluminance value for the pixel or to a color channel intensity value.

However, more generally the mapping input may comprise a combination ofcolor coordinates for pixels of a LDR image. Each color coordinate maysimply correspond to one value of a pixel, such as to one of the R, Gand B values of an RGB signal or to one of the Y, U, V values of a YUVsignal. In some embodiments, the combination may simply correspond tothe selection of one of the color coordinate values, i.e. it maycorrespond to a combination wherein all color coordinates apart from theselected color coordinate value are weighted by zero weights.

In other embodiments, the combination may be of a plurality of colorcoordinates for a single pixel. Specifically, the color coordinates ofan RGB signal may simply be combined to generate a luminance value. Inother embodiments, more flexible approaches may be used such as forexample a weighted luminance value where all color channels areconsidered but the color channel for which the grid is developed isweighted higher than the other color channels.

In some embodiments, the combination may take into account pixel valuesfor a plurality of pixel positions. For example, a single luminancevalue may be generated which takes into account not only the luminanceof the pixel for the position being processed but which also takes intoaccount the luminance for other pixels.

Indeed, in some embodiments, combination values may be generated whichdo not only reflect characteristics of the specific pixel but alsocharacteristics of the locality of the pixel and specifically of howsuch characteristics vary around the pixel.

As an example, a luminance or color intensity gradient component may beincluded in the combination. E.g. the combination value may be generatedtaking into account the difference between luminance of the currentpixel value and the luminances of each of the surrounding pixels.Further the difference to the luminances to the pixels surrounding thesurrounding pixels (i.e. the next concentric layer) may be determined.The differences may then be summed using a weighted summation whereinthe weight depends on the distance to the current pixel. The weight mayfurther depend on the spatial direction, e.g. by applying opposite signsto differences in opposite directions. Such a combined difference basedvalue may be considered to be indicative of a possible luminancegradient around the specific pixel.

Thus, applying such a spatially enhanced mapping may allow the HDR imagegenerated from a LDR image to take spatial variations into accountthereby allowing it to more accurately reflect such spatial variations.

As another example, the combination value may be generated to reflect atexture characteristic for the image area included the current pixelposition. Such a combination value may e.g. be generated by determininga pixel value variance over a small surrounding area. As anotherexample, repeating patterns may be detected and considered whendetermining the combination value.

Indeed, in many embodiments, it may be advantageous for the combinationvalue to reflect an indication of the variations in pixel values aroundthe current pixel value. For example, the variance may directly bedetermined and used as an input value.

As another example, the combination may be a parameter such as a localentropy value. The entropy is a statistical measure of randomness thatcan e.g. be used to characterize the texture of the input image. Anentropy value H may for example be calculated as:

${{H(I)} = {- {\sum\limits_{j = 1}^{n}{{p\left( I_{j} \right)}\log_{b}{p\left( I_{j} \right)}}}}},$

where p( ) denotes the probability density function for the pixel valuesI_(j) in the image I. This function can be estimated by constructing thelocal histogram over the neighborhood being considered (in the aboveequation, n neighboring pixels). The base of the logarithm b istypically set to 2.

It will be appreciated that in embodiments wherein a combination valueis generated from a plurality of individual pixel values, the number ofpossible combination values that are used in the grid for each spatialinput set may possibly be larger than the total number of pixel valuequantization levels for the individual pixel. E.g. the number of binsfor a specific spatial position may exceed the number of possiblediscrete luminance values that a pixel can attain. However, the exactquantization of the individual combination value, and thus the size ofthe grid, is best optimized for the specific application.

It will be appreciated that the generation of the HDR image from the LDRimage can be in response to various other features, parameters andcharacteristics.

For example, the generation of the HDR image may be in response to depthinformation associated with the LDR image. Such an approach may inprinciple be used without the described mapping and it is conceivablethat the HDR image can be generated e.g. based only on the LDR image andthe depth information. However, particularly advantageous performancecan be achieved when the LDR to HDR mapping is used together with adepth based prediction.

Therefore in some embodiments the encoder may also include a depthdecoder which e.g. encodes a depth map for the LDR image and includesthe encoded depth data in the data stream which is transmitted to thedecoder. The decoder can then decode the depth map and generate the HDRimage in response to the decoded depth map. FIG. 11 illustrates how thedecoder of FIG. 7 may be enhanced by the inclusion of a depth decoder1101 which is fed the encoded depth data from the receive circuit 701and which then proceeds to decode the data to generate the depth map forthe LDR image. The depth map is then fed to the decode predictor 705where it is used to generate the prediction for the HDR image (or insome examples it may be used to generate an HDR image which is useddirectly as the output HDR image).

For example, in scenes that are lit by bright focused lights, theforeground objects may often be brighter than objects that are in thebackground. Thus, having knowledge of the depth of a given object, maybe used to determine how the increased dynamic range is utilized. Forexample, foreground objects may be made brighter to exploit theadditional dynamic range of an HDR image whereas background objects maynot necessarily be brightened equivalently as this could potentiallyincrease the perceived significance of background objects more thanintended or realized by the specific lighting of the scene.

The mapping to generate HDR output pixels may thus not only be dependenton the colour combinations and image position but may also be dependenton the depth information at that position. This information may beincluded in the mapping in different ways. For example, differentmapping grids may be generated for the combinations of colourcombinations and for the depth values, and thus for each position alook-up in two look up tables may be performed. The resulting two HDRvalues for the given position may then be generated by a combination ofthe two HDR values, e.g. by a simple averaging. As another example, asingle look-up table having input sets comprising combinations of colourcoordinates and spatial positions and an output in the form of an HDRvalue may be used (e.g. the same look-up table in the example of FIG.7). The depth consideration may then be achieved by a depth dependentadaptation of the input data prior to the table look-up and/or by adepth dependent adaptation of the output HDR value. The functions thatare applied to the input and/or output data may be predeterminedfunctions or may e.g. be determined based on previous images.

In some embodiments, the mapping may be implemented as a grid that alsoincludes depth information. For example, each bin may be defined by aninterval for each spatial image dimension, an interval for each colourcoordinate, and an interval for the depth value. Such a table may bepopulated as previously described except that for each pixel position,the bin is further selected such that the depth indication for the pixelposition falls within the depth interval of the bin. Such population mayof course be based on a previous image and depth map and may accordinglybe performed independently but consistently at both the encoder and thedecoder.

Other parameters that may be considered in the mapping may includevarious image characteristics such as for example characteristics ofimage objects. For example, it is known that skin tones are verysensitive to manipulation in order for them to maintain a natural look.Therefore, the mapping may particularly take into account whether thecombination of colour coordinates corresponds to skin tones and mayperform a more accurate mapping for such tones.

As another example, the encoder and/or decoder may comprisefunctionality for extracting and possible identifying image objects andmay adjust the mapping in response to characteristics of such objects.For example, various algorithms are known for detection of faces in animage and such algorithms may be used to adapt the mapping in areas thatare considered to correspond to a human face.

Thus, in some embodiments the encoder and/or decoder may comprise meansfor detecting image objects and means for adapting the mapping inresponse to image characteristics of the image objects. In particular,the encoder and/or decoder may comprise means for performing facedetection and means for adapting the mapping in response to facedetection.

It will be appreciated that the mapping may be adapted in many differentways. As a low complexity example, different grids or look-up tables maysimply be used for different areas. Thus, the encoder/decoder may bearranged to select between different mappings in response to the facedetection and/or image characteristics for an image object.

As a specific example, the encoder and/or decoder may in the referenceimages identify any areas that are considered to correspond to humanfaces. For these areas, one look-up table may be generated and a secondlook-up table may be used for other areas. The generation of the twolook-up tables may use different approaches and/or the mapping may bedifferent in the two examples. For example, the mapping may be generatedto include a saturation increase for general areas but not for areasthat correspond to faces. As another example, finer granularity of themapping for face areas may be used than for areas that do not correspondto faces.

Other means of adapting the mapping can be envisaged. For example, insome embodiments the input data sets may be processed prior to themapping. For example, a parabolic function may be applied to colourvalues prior to the table look-up. Such a preprocessing may possibly beapplied to all input values or may e.g. be applied selectively. Forexample, the input values may only be pre-processed for some areas orimage objects, or only for some value intervals. For example, thepreprocessing may be applied only to colour values that fall within askin tone interval and/or to areas that are designated as likely tocorrespond to a face.

Alternatively or additionally, post-processing of the output HDR pixelvalues may be applied. Such post-processing may similarly be appliedthroughout or may be selectively applied. For example, it may only beapplied to output values that correspond to skin tones or may only beapplied to areas considered to correspond to faces. In some systems, thepost-processing may be arranged to partially or fully compensate for apre-processing. For example, the pre-processing may apply a transformoperation with the post-processing applying the reverse transformation.

As a specific example, the pre-processing and/or post-processing maycomprise a filtering of (one or more) of the input/output values. Thismay in many embodiments provide improved performance and in particularthe mapping may often result in improved prediction. For example thefiltering may result in reduced banding.

As an example of a pre-processing it may in some examples be desirableto apply a color transformation to a suitable color space. Many standardvideo color spaces (e.g. YCbCr) are only loosely connected to humanperception. It may therefore be advantageous to convert the video datainto a perceptually uniform color space (color spaces in which a certainstep size corresponds to a fixed perceptual difference). Examples ofsuch a color spaces include Yu′v′, CIELab or CIELuv. The benefit of sucha preprocessing step is that errors resulting from predictioninaccuracies will have a perceptually more uniform effect.

In some embodiments the mapping may be non-uniformly subsampled. Themapping may specifically be at least one of a spatially non-uniformsubsampled mapping; a temporally non-uniform subsampled mapping; and acombination value non-uniform subsampled mapping.

The non-uniform subsampling may be a static non-uniform subsampling orthe non-uniform subsampling may be adapted in response to e.g. acharacteristics of the combinations of colour coordinates or of an imagecharacteristic.

For example, the colour value subsampling may be dependent on the colourcoordinate values. This may for example be static such that bins forcolour values corresponding to skin tones may cover much smaller colourcoordinate value intervals than for colour values that cover othercolours.

As another example, a dynamic spatial subsampling may be applied whereina much finer subsampling of areas that are considered to correspond tofaces is used than for areas that are not considered to correspond tofaces. It will be appreciated that many other non-uniform subsamplingapproaches can be used.

As another example, when images contain smooth gradients over a limitedluminance range, it may be advantageous to use a finer quantization stepfor that range to prevent quantization artifacts from becoming visiblein the gradient.

In yet another example, the sampling/quantisation may depend on thefocus in the image. This could be derived from sharpness metrics orfrequency analysis. For a blurred background the signal prediction doesnot need to be equally accurate as for small bright objects that acamera focuses on. In general, areas that contain few details can bequantized more coarsely, as the piecewise linear approximation offeredby the described approach will suffice.

In the previous examples, a three dimensional mapping/grid has beenused. However, in other embodiments an N dimensional grid may be usedwhere N is an integer larger than three. In particular, the two spatialdimensions may be supplemented by a plurality of pixel value relateddimensions.

Thus, in some embodiments the combination may comprise a plurality ofdimensions with a value for each dimension. As a simple example, thegrid may be generated as a grid having two spatial dimensions and onedimension for each color channel. E.g. for an RGB image, each bin may bedefined by a horizontal position interval, a vertical position interval,an R value interval, a G value interval and a B value interval).

As another example, the plurality of pixel value dimensions mayadditionally or alternatively correspond to different spatialdimensions. For example, a dimension may be allocated to the luminanceof the current pixel and to each of the surrounding pixels.

Such, multi-dimensional grids may provide additional information thatallows an improved prediction and in particular allows the HDR image tomore closely reflect relative differences between pixels.

In some embodiments, the encoder may be arranged to adapt the operationin response to the prediction.

For example, the encoder may generate the predicted HDR image aspreviously described and may then compare this to the input HDR image.This may e.g. be done by generating the residual image and evaluatingthis image. The encoder may then proceed to adapt the operation independence on this evaluation, and may in particular adapt the mappingand/or the residual image depending on the evaluation.

As a specific example, the encoder may be arranged to select which partsof the mapping to include in the encoded data stream based on theevaluation. For example, the encoder may use a previous set of images togenerate the mapping for the current image. The corresponding predictionbased on this mapping may be determined and the corresponding residualimage may be generated. The encoder may evaluate the residual areas toidentify areas in which the prediction is considered sufficientlyaccurate and areas in which the prediction is considered to not besufficiently accurate. E.g. all pixel values for which the residualimage value is lower than a given predetermined threshold may beconsidered to be predicted sufficiently accurately. Therefore, themapping values for such areas are considered sufficiently accurate, andthe grid values for these values can be used directly by the decoder.Accordingly, no mapping data is included for input sets/bins that spanonly pixels that are considered to be sufficiently accurately predicted.

However, for the bins that correspond to pixels which are notsufficiently accurately predicted, the encoder may proceed to generatenew mapping values based on using the current set of images as thereference images. As this mapping information cannot be recreated by thedecoder, it is included in the encoded data. Thus, the approach may beused to dynamically adapt the mapping to consist of data bins reflectingprevious images and data bins reflecting the current images. Thus, themapping is automatically adapted to be based on the previous images whenthis is acceptable and the current images when this is necessary. Asonly the bins generated based on the current images need to be includedin the encoded output stream, an automatic adaptation of thecommunicated mapping information is achieved.

Thus in some embodiments, it may be desirable to transmit a better (notdecoder-side constructed) LDR-HDR mapping for some regions of the image,e.g. because the encoder can detect that for those regions, the HDRimage prediction is not sufficiently good, e.g. because of criticalobject changes, or because the object is really critical (such as aface).

In some embodiments, a similar approach may alternatively oradditionally be used for the residual image. As a low complexityexample, the amount of residual image data that is communicated may beadapted in response to a comparison of the input high dynamic rangeimage and the predicted high dynamic range image. As a specific example,the encoder may proceed to evaluate how significant the information inthe residual image is. For example, if the average value of the pixelsof the residual image is less than a given threshold, this indicatesthat the predicted image is close to the input HDR image. Accordingly,the encoder may select whether to include the residual image in theencoded output stream or not based on such a consideration. E.g. if theaverage luminance value is below a threshold, no encoding data for theresidual image is included and if it is above the threshold encodingdata for the residual image is included.

In some embodiments a more nuanced selection may be applied whereinresidual image data is included for areas in which the pixel values onaverage are above a threshold but not for image areas in which the pixelvalues on average are below the threshold. The image areas may forexample have a fixed size or may e.g. be dynamically determined (such asby a segmentation process).

In some embodiments, the encoder may further generate the mapping toprovide desired visual effects. For example, in some embodiments, themapping may not be generated to provide the most accurate prediction butrather may be generated to alternatively or additionally impart adesired visual effect. For example, the mapping may be generated suchthat the prediction also provides e.g. a color adjustment, a contrastincrement, sharpness correction etc. Such a desired effect may forexample be applied differently in different areas of the image. Forexample, image objects may be identified and different approaches forgenerating the mapping may be used for the different areas.

Indeed, in some embodiments, the encoder may be arranged to selectbetween different approaches for generating the mapping in response toimage characteristics, and in particular in response to local imagecharacteristics.

For example, the encoder may provide an increased dynamic rangeextension in areas dominated by mid-luminance pixels than for areasdominated by high or low luminance pixels. Thus, the encoder may analyzethe input LDR or HDR images and dynamically select different approachesfor different image areas. For example, a luminance offset may be addedto specific bins dependent on characteristics of the area to which theybelong. Although, this approach may still use an approach that isadapting based on the specific images it may also be used to providedesired visual image characteristics that do perhaps not result in acloser approximation to the input HDR image but rather to a desired HDRimage. The approach may introduce some uncertainty of how exactly themapping is generated in the encoder and in order to allow the decoder toindependently match this mapping, the encoder may include data definingor describing the selected mapping. For example, the applied offset toindividual bins may be communicated to the decoder.

In the examples, the mapping has been based on an adaptive generation ofa mapping based on sets of LDR and HDR input images. In particular, themapping may be generated based on previous LDR and HDR input images asthis does not require any mapping information to be included in theencoded data stream. However, in some cases this is not suitable, e.g.for a scene change, the correlation between a previous image and thecurrent image is unlikely to be very high. In such a case, the encodermay switch to include a mapping in the encoded output data. E.g. theencoder may detect that a scene chance occurs and may accordinglyproceed to generate the mapping for the image(s) immediately followingthe scene change based on the current images themselves. The generatedmapping data is then included in the encoded output stream. The decodermay proceed to generate mappings based on previous images except forwhen explicit mapping data is included in the received encoded bitstream in which case this is used.

In some embodiments, the decoder may use a reference mapping for atleast some low dynamic range images of the low dynamic range videosequence. The reference mapping may be a mapping that is not dynamicallydetermined in response to LDR and HDR image sets of the video sequence.A reference mapping may be a predetermined mapping.

For example, the encoder and decoder may both have information of apredetermined default mapping that can be used to generate an HDR imagefrom an LDR image. Thus, in an embodiment where dynamic adaptivemappings are generated from previous images, the default predeterminedmapping may be used when such a determined mapping is unlikely to be anaccurate reflection of the current image. For example, after a scenechange, a reference mapping may be used for the first image(s).

In such cases, the encoder may detect that a scene change has occurred(e.g. by a simple comparison of pixel value differences betweenconsecutive images) and may then include a reference mapping indicationin the encoded output stream which indicates that the reference mappingshould be used for the prediction. It is likely that the referencemapping will result in a reduced accuracy of the predicted HDR image.However, as the same reference mapping is used by both the encoder andthe decoder, this results only in increased values (and thus increaseddata rate) for the residual image.

In some embodiments, the encoder and decoder may be able to select thereference mapping from a plurality of reference mappings. Thus ratherthan using just one reference mapping, the system may have sharedinformation of a plurality of predetermined mappings. In suchembodiments, the encoder may generate a predicted HDR image andcorresponding residual image for all possible reference mappings. It maythen select the one that results in the smallest residual image (andthus in the lowest encoded data rate). The encoder may include areference mapping indicator which explicitly defines which referencemapping has been used in the encoded output stream. Such an approach mayapprove the prediction and thus reduce the data rate required forcommunicating the residual image in many scenarios.

Thus, in some embodiments a fixed LUT (mapping) may be used (or oneselected from a fixed set and with only the corresponding index beingtransmitted) for the first frame or the first frame after a scenechange. Although, the residual for such frames will generally be higher,this is typically outweighed by the fact that no mapping data has to beencoded.

In the examples, the mapping is thus arranged as a multidimensional maphaving two spatial image dimensions and at least one combination valuedimension. This provides a particularly efficient structure.

In some embodiments, a multi-dimensional filter may be applied to themultidimensional map, the multi-dimensional filter including at leastone combination value dimension and at least one of the spatial imagedimensions. Specifically a moderate multi-dimensional low-pass filtermay in some embodiments be applied to the multi-dimensional grid. Thismay in many embodiments result in an improved prediction and thusreduced data rate. Specifically, it may improve the prediction qualityfor some signals, such as smooth intensity gradients that typicallyresult in contouring artifacts when represented at insufficient bitdepth.

In the previous description a single HDR image has been generated froman LDR image. However, multi-view capturing and rendering of scenes hasbeen of increasing interest. For example, three dimensional (3D)television is being introduced to the consumer market. As anotherexample, multi-view computer displays allowing a user to look aroundobjects etc have been developed.

A multi-view image may thus comprise a plurality of images of the samescene captured or generated from different view points. The followingwill focus on a description for a stereo-view comprising a left andright (eye) view of a scene. However, it will be appreciated that theprinciples apply equally to views of a multi-view image comprising morethan two images corresponding to different directions and that inparticular the left and right images may be considered to be two imagesfor two views out of the more than two images/views of the multi-viewimage.

In many scenarios it is accordingly desirable to be able to efficientlygenerate, encode or decode multi-view images and this may in manyscenarios be achieved by one image of the multi-view image beingdependent on another image.

For example, based on an HDR image for a first view, an HDR image for asecond view may be encoded. For example, as illustrated in FIG. 12, theencoder of FIG. 2 may be enhanced to provide encoding for a stereo viewimage. Specifically, the encoder of FIG. 12 corresponds to the encoderof FIG. 2 but further comprises a second receiver 1201 which is arrangedto receive a second HDR image. In the following, the HDR image receivedby the first receiver 201 will be referred to as the first view imageand the HDR image received by the second receiver 1201 will be referredto as the second view image. The first and second view images areparticularly right and left images of a stereo image and thus whenprovided to the right and left eyes of a viewer provides a threedimensional experience.

The first view image is encoded as previously described. Furthermore,the encoded first view image is fed to a view predictor 1203 whichproceeds to generate a prediction for the second view image from thefirst view image. Specifically, the system comprises an HDR decoder 1205between the HDR encoder 213 and the view predictor 1203 which decodesthe encoding data for the first view image and provides the decodedimage to the view predictor 1203, which then generates a prediction forthe second view image therefrom. In a simple example, the first viewimage may itself be used directly as a prediction for the second image.

The encoder of FIG. 12 further comprises a second encoder 1207 whichreceives the predicted image from the view predictor 1203 and theoriginal image from the second receiver 1201. The second encoder 1207proceeds to encode the second view image in response to the predictedimage from the view predictor 1203. Specifically, the second encoder1207 may subtract the predicted image from the second view image andencode the resulting residual image. The second encoder 1207 is coupledto the output processor 215 which includes the encoded data for thesecond view image in the output stream.

The described approach may allow a particularly efficient encoding formulti-view HDR images. In particular, a very low data rate for a givenimage quality can be achieved.

Different approaches may be used for predicting the second image viewfrom the first image view. As mentioned, the first image view may evenin some examples be used directly as the prediction of the second view.

A particularly efficient and high performance system may be based on thesame approach of mapping as described for the mapping between the LDRand HDR images.

Specifically, based on reference images, a mapping may be generatedwhich relates input data in the form of input sets of image spatialpositions and a combination of color coordinates of high dynamic rangepixel values associated with the image spatial positions to output datain the form of high dynamic range pixel values. Thus, the mapping isgenerated to reflect a relationship between a reference high dynamicrange image for the first view (i.e. corresponding to the first viewimage) and a corresponding reference high dynamic range image for thesecond view (i.e. corresponding to the second view image).

This mapping may be generated using the same principles as previouslydescribed for the LDR to HDR mapping. In particular, the mapping may begenerated based on a previous stereo image. For example, for theprevious stereo image, each spatial position may be evaluated with theappropriate bin of a mapping being identified as the one covering amatching image spatial interval and HDR colour coordinate intervals. Thecorresponding HDR colour coordinate values in the reference image forthe second view may then be used to generate the output value for thatbin (and may in some examples be used directly as the output value).Thus, the approach may provide advantages in line with those of theapproach being applied to LDR to HDR mapping including automaticgeneration of mapping, accurate prediction, practical implementationsetc.

A particular efficient implementation of encoders may be achieved byusing common, identical or shared elements. In some systems, apredictive encoder module may be used for a plurality of encodingoperations.

Specifically, a basic encoding module may be arranged to encode an inputimage based on a prediction of the image. The basic encoding module mayspecifically have the following inputs and outputs:

an encoding input for receiving an image to be encoded;

a prediction input for receiving a prediction for the image to beencoded; and

an encoder output for outputting the encoded data for the image to beencoded.

An example of such an encoding module is the encoding module illustratedin FIG. 13. The specific encoding module uses an H264 codec 1301 whichreceives the input signal IN containing the data for the image to beencoded. Further, the H264 codec 1301 generates the encoded output dataBS by encoding the input image in accordance with the H264 encodingstandards and principles. This encoding is based on one or moreprediction images which are stored in prediction memories 1303, 1305.One of these prediction memories 1305 is arranged to store the inputimage from the prediction input (INex). In particular, the basicencoding module may overwrite prediction images generated by the basicencoding module itself. Thus, in the example, the prediction memories1303, 1305 are in accordance with the H264 standard filled with previousprediction data generated by decoding of previous encoded images of thevideo sequence. However, in addition, at least one of the predictionmemories 1305 is overwritten by the input image from the predictioninput, i.e. by a prediction generated externally. Whereas the predictiondata generated internally in the encoding module is typically temporalor spatial predictions i.e. from previous or future images of the videosequence or from spatially neighbouring areas, the prediction providedon the prediction input may typically be non-temporal, non-spatialpredictions. For example, it may be a prediction based on an image froma different view. For example, the second view image may be encodedusing an encoding module as described, with the first view image beingfed to the prediction input.

The exemplary encoding module of FIG. 13 further comprises an optionaldecoded image output OUT_(loc) which can provide the decoded imageresulting from decoding of the encoded data to external functionality.Furthermore, a second optional output in the form of a delayed decodedimage output OUT_(loc(τ-1)) provides a delayed version of the decodedimage.

The encoding unit may specifically be an encoding unit as described inWO2008084417, the contents of which is hereby incorporated by reference.

Thus, in some examples the system may encode a video signal whereinimage compression is performed and multiple temporal predictions areused with multiple prediction frames being stored in a memory, andwherein a prediction frame in memory is overwritten with a separatelyproduced prediction frame.

The overwritten prediction frame may specifically be one or more of theprediction frames longest in memory.

The memory may be a memory in an enhancement stream encoder and aprediction frame may be overwritten with a frame from a base streamencoder.

In particular, a temporal prediction frame may be overwritten with adepth view frame.

The encoding module may be used in many advantageous configurations andtopologies, and allows for a very efficient yet low cost implementation.For example, in the encoder of FIG. 12, the same encoding module may beused both for the LDR encoder 205, the HDR encoder 213 and the secondHDR encoder 1207.

Various advantageous configurations and uses of an encoding module suchas that of FIG. 13 will be described with reference to FIGS. 14-17.

FIG. 14 illustrates an example wherein a basic encoding module, such asthat of FIG. 13, may be used for encoding of both an LDR image and acorresponding HDR image in accordance with the previously describedprinciples. In the example, the basic encoding module 1401, 1405 is usedboth to encode the LDR image and the HDR image. In the example, the LDRimage is fed to the encoding module 1401 which proceeds to generate anencoded bitstream BS LDR without any prediction for the LDR image beingprovided on the prediction input (although the encoding may useinternally generated predictions, such as temporal predictions used formotion compensation).

The basic encoding module 1401 further generates a decoded version ofthe LDR image on the decoded image output and a delayed decoded image onthe delayed decoded image output. These two decoded images are fed tothe predictor 1403 which further receives a delayed decoded HDR image,i.e. a previous HDR image. The predictor 1403 proceeds to generate amapping based on the previous (delayed) decoded LDR and HDR images. Itthen proceeds to generate a predicted image for the current HDR image byapplying this mapping to the current decoded LDR image.

The basic encoding module 1405 then proceeds to encode the HDR imagebased on the predicted image. Specifically, the predicted image is fedto the prediction input of the basic encoding module 1405 and the HDRimage is fed to the input. The basic encoding module module 1405 thengenerates an output bitstream BS HDR corresponding to the HDR image. Thetwo bitstreams BS LDR and BS HDR may be combined into a single outputbitstream.

In the example, the same encoding module (represented by the twofunctional manifestations 1401, 1405) is thus used to encode both theLDR and the HDR image. This may be achieved using only one basicencoding module time sequentially. Alternatively, identical basicencoding modules can be implemented. This may result in substantial costsaving.

In the example, the HDR image is thus encoded in dependence on the LDRimage whereas the LDR image is not encoded in dependence on the HDRimage. Thus, a hierarchical arrangement of encoding is provided where ajoint encoding/compression is achieved with one image being dependent onanother (which however is not dependent on the first image).

It will be appreciated that the example of FIG. 14 may be seen as aspecific implementation of the encoder of FIG. 2 where identical or thesame encoding module is used for the HDR and LDR image. Specifically,the same basic encoding module may be used to implement both the LDRencoder 205 and LDR decoder 207 as well as the HDR encoder 213 of FIG.2.

Another example is illustrated in FIG. 15. In this example, a pluralityof identical or a single basic encoding module 1501, 1503 is used toperform an efficient encoding of a stereo image. In the example, a leftLDR image is fed to a basic encoding module 1401 which proceeds toencode the left LDR image without relying on any prediction. Theresulting encoding data is output as first bitstream L BS. Image datafor a right LDR image is input on the image data input of a basicencoding module 1503. Furthermore, the left image is used as aprediction image and thus the decoded image output of the basic encodingmodule 1501 is coupled to the prediction input of the basic encodingmodule 1503 such that the decoded version of the L LDR image is fed tothe prediction input of the basic encoding module 1503 which proceeds toencode the right LDR image based on this prediction. The basic encodingmodule 1503 thus generates a second bitstream R BS comprising encodingdata for the right image (relative to the left image).

FIG. 16 illustrates an example wherein a plurality of identical or asingle basic encoding module 1401, 1403, 1603, 1601 is used to provide ajoint and combined encoding of both HDR and stereo views. In theexample, the approach of FIG. 14 is applied to left LDR and HDR images.In addition, a right HDR image is encoded based on the left HDR image.Specifically, a right HDR image is fed to the image data input of abasic encoding module 1601 of which the prediction input is coupled tothe decoded image output of the basic encoding module 1405 encoding theleft HDR image. Thus, in the example, the right HDR image is encoded bythe basic encoding module 1601 based on the left HDR image. Thus, theencoder of FIG. 16 generates a left LDR image bitstream L BS, a left HDRimage bitstream L HDR BS, and a right HDR image bitstream R HDR BS.

In the specific example of FIG. 16, a fourth bitstream may also beencoded for a right LDR image. In the example, a basic encoding module1603 receives a right LDR image on the image data input whereas thedecoded version of the left LDR image is fed to the prediction input.The basic encoding module 1603 then proceeds to encode the right LDRimage to generate the fourth bitstream R BS.

Thus, in the example of FIG. 16, both stereo and HDR characteristics arejointly and efficiently encoded/compressed. In the example, the leftview LDR image is independently coded and the right view LDR imagedepends on the left LDR image. Furthermore, the L HDR image depends onthe left LDR image. The right HDR image depends on the left HDR imageand thus also on the left LDR image. In the example the right LDR imageis not used for encoding/decoding any of the stereo HDR images. Anadvantage of this is that only 3 basic modules are required forencoding/decoding the stereo HDR signal. As such, this solution providesimproved backwards compatibility.

FIG. 17 illustrates an example, wherein the encoder of FIG. 16 isenhanced such that the right LDR image is also used to encode the rightHDR image. Specifically, a prediction of the right HDR image may begenerated from the left LDR image using the same approach as for theleft HDR image. Specifically, a mapping as previously described may beused. In the example, the prediction input of the basic encoding module1501 is arranged to receive two prediction images which may both be usedfor the encoding of the right HDR image. For example, the two predictionimages may overwrite two prediction memories of the basic encodingmodule 1601.

Thus, in this example, both stereo and HDR are jointly encoded and(more) efficiently compressed. Here, the left view LDR image isindependently coded and the right view LDR image is encoded dependent onthe left LDR image. In this example, the right LDR image is also usedfor encoding/decoding the stereo HDR signal, and specifically the rightHDR image. Thus, in the example, two predictions may be used for theright HDR image thereby allowing higher compression efficiency, albeitat the expense of requiring four basic encoding modules (or reusing thesame basic encoding module four times).

Thus, in the examples of FIGS. 14-17, the same basicencoding/compression module is used for joint HDR and stereo coding,which is both beneficial for compression efficiency and forimplementation practicality and cost.

It will be appreciated that FIGS. 14-17 are functional illustrations andmay reflect a time sequential use of the same encoding module or maye.g. illustrate parallel applications of identical encoding modules.

The described encoding examples thus generate output data which includesan encoding of one or more images based on one or more images. Thus, inthe examples, at least two images are jointly encoded such that one isdependent on the other but with the other not being dependent on thefirst. For example, in the encoder of FIG. 16, the two HDR images arejointly encoded with the right HDR image being encoded in dependence onthe left HDR image (via the prediction) whereas the left HDR image isindependently encoded of the right HDR image.

This asymmetric joint encoding can be used to generate advantageousoutput streams. Specifically, the two output streams R HDR BS and L HDRBS for the right and left HDR images respectively are generated (split)as two different data streams which can be multiplexed together to formthe output data stream. The L HDR BS data stream which does not requiredata from the R HDR BS data stream may be considered a primary datastream and the R HDR BS data stream which does require data from the LHDR BS data stream may be considered a secondary data stream. In aparticularly advantageous example the multiplexing is done such that theprimary and secondary data streams are provided with separate codes.Thus, a different code (header/label) is assigned to the two datastreams thereby allowing the individual data streams being separated andidentified in the output data stream.

As a specific example, the output data stream may be divided into datapackets or segments with each packet/segment comprising data from onlythe primary or the secondary data stream and with each packet/segmentbeing provided with a code (e.g. in a header, preamble, midamble orpostamble) that identifies which stream is included in the specificpacket/segment.

Such an approach may allow improved performance and may in particularallow backwards compatibility. For example, a fully compatible stereodecoder may be able to extract both the right and left HDR images togenerate a full stereo HDR image. However, a non-stereo decoder canextract only the primary data stream. Indeed, as this data stream isindependent of the right HDR image, the non-stereo decoder can proceedto decode a single HDR image using non-stereo techniques.

It will be appreciated that the approach may be used for differentencoders. For example, for the encoder of FIG. 14, the BS LDR bit streammay be considered the primary data stream and the BS HDR bit stream maybe considered the secondary data stream. In the example of FIG. 15, theL BS bit stream may be considered the primary data stream and the R BSbit stream may be considered the secondary data stream. Thus, in someexamples, the primary data stream may comprise data which is fully selfcontained, i.e. which does not require any other encoding data input(i.e. which is not dependent on encoding data from any other data streambut is encoded self consistently).

Also, the approach may be extended to more than two bit streams. Forexample, for the encoder of FIG. 15, the L BS bitstream (which is fullyself contained) may be considered the primary data stream, the L HDR BS(which is dependent on the L BS bitstream but not on the R HDR BSbitstream) may be considered the secondary data stream, and the R HDR BSbitstream (which is dependent on both the L BS and the L HDR BSbitstream) may be considered a tertiary data stream. The three datastreams may be multiplexed together with each data stream beingallocated its own code.

As another example, the four bit streams generated in the encoder ofFIG. 16 or 17 may be included in four different parts of the output datastream. As a specific example, the multiplexing of the bit streams maygenerate an output stream including the following parts: part1containing all L BS packets with descriptor code 0x1B (regular H264),part2 containing all R BS packets with descriptor code 0x20 (thedependent stereo view of MVC), part3 containing all L HDR BS packetswith descriptor code 0x21 and part4 containing all R HDR BS enh packetswith descriptor code 0x22. This type of multiplexing allows for flexibleusage of the stereo HDR multiplex while maintaining the backwardcompatibility with MVC stereo and H264 mono. In particular, the specificcodes allows a traditional H264 decoder decoding an LDR image whileallowing suitably equipped (e.g. H264 or MVC based) decoders to decodemore advanced images, such as the HDR and/or stereo images.

The generation of the output stream may specifically follow the approachdescribed in WO2009040701 which is hereby incorporated by reference.

Such approaches may combine the advantages of other methods whileavoiding their respective drawbacks. The approach comprises jointlycompressing two or more video data signals, followed by forming two ormore (primary and secondary) separate bit-streams. A primary bit streamthat is self-contained (or not dependent on the secondary bit stream)and can be decoded by video decoders that may not be capable of decodingboth bit streams. One or more secondary bit streams (often calledauxiliary-video-representation streams) that are dependent on theprimary bit stream. The separate bit streams are multiplexed wherein theprimary and secondary bit-streams are separate bit streams provided withseparate codes and transmitted. Prima facie it may seem superfluous anda waste of effort to first jointly compress signals only to split themagain after compression and having them provided with separate codes. Incommon techniques the compressed video data signal is given a singlecode in the multiplexer. Prima facie the approach seems to add anunnecessary complexity in the encoding of the video data signal.

However it has been realized that splitting and separately packaging(i.e. giving the primary and secondary bit stream separate codes in themultiplexer) of the primary and secondary bit stream in the multiplexedsignal has the result that, on the one hand, a standard demultiplexer ina conventional video system will recognize the primary bit stream by itscode and send it to the decoder so that the standard video decoderreceives only the primary stream, the secondary stream not having passedthe de-multiplexer, and the standard video decoder is thus able tocorrectly process it as a standard video data signal, while on the otherhand a specialized system can completely reverse the encoding processand re-create the original enhanced bit-stream before sending it to asuitable decoder.

In the approach the primary and secondary bit streams are separate bitstreams wherein the primary bit stream may specifically be aself-contained bit stream. This allows the primary bit stream to begiven a code corresponding to a standard video data signal while givingthe secondary bit stream or secondary bit streams codes that will not berecognized by standard demultiplexers as a standard video data signal.At the receiving end, standard demultiplexing devices will recognize theprimary bit stream as a standard video data signal and pass it on to thevideo decoder. The standard demultiplexing devices will reject thesecondary bit-streams, not recognizing them as standard video datasignals. The video decoder itself will only receive the “standard videodata signal”. The amount of bits received by the video decoder itself isthus restricted to the primary bit stream which may be self containedand in the form of a standard video data signal and is interpretable bystandard video devices and having a bitrate which standard video devicescan cope with The video decoder is not overloaded with bits it canhandle.

The coding can be characterized in that a video data signal is encodedwith the encoded signal comprising a first and at least a second set offrames, wherein the frames of the first and second set are interleavedto form an interleaved video sequence, or in that an interleaved videodata signal comprising a first and second set of frames is received,wherein the interleaved video sequence is compressed into a compressedvideo data signal, wherein the frames of the first set are encoded andcompressed without using frames of the second set, and the frames of thesecond set are encoded and compressed using frames of the first set, andwhereafter the compressed video data signal is split into a primary andat least a secondary bit-stream each bit-stream comprising frames,wherein the primary bit-stream comprises compressed frames for the firstset, and the secondary bit-stream for the second set, the primary andsecondary bit-streams forming separate bit streams, whereafter theprimary and secondary bit streams are multiplexed into a multiplexedsignal, the primary and secondary bit stream being provided withseparate codes.

After the interleaving at least one set, namely the set of frames of theprimary bit-stream, may be compressed as a “self-contained” signal. Thismeans that the frames belonging to this self-contained set of frames donot need any info (e.g. via motion compensation, or any other predictionscheme) from the other secondary bit streams.

The primary and secondary bit streams form separate bit streams and aremultiplexed with separate codes for reasons explained above.

In some examples, the primary bit stream comprises data for frames ofone view of a multi-view video data signal and the secondary bit streamcomprises data for frames of another view of a multi-view data signal.

FIG. 18 illustrates an example of possible interleaving of two views,such as the HDR left (L) and right (R) views of the encoder of FIG. 16,each comprised of frames 0 to 7 into an interleaved combined signalhaving frames 0 to 15.

In the specific example, the frames/images of the L HDR BS and the R HDRBS of FIG. 16 are divided into individual frames/segments as shown inFIG. 17.

The frames of the left and right view are then interleaved to provide acombined signal. The combined signal resembles a two dimensional signal.A special feature of the compression is that the frames of one of theviews is not dependent on the other (and may be a self-containedsystem), i.e. in compression no information from the other view is usedfor the compression. The frames of the other view are compressed usinginformation from frames of the first view. The approach departs from thenatural tendency to treat two views on an equal footing. In fact, thetwo views are not treated equally during compression. One of the viewsbecomes the primary view, for which during compression no information isused form the other view, the other view is secondary. The frames of theprimary view and the frames of the secondary view are split into aprimary bit-stream and a secondary bit stream. The coding system cancomprise a multiplexer which assigns a code, e.g. 0x01 for MPEG or 0x1Bfor H.264, recognizable for standard video as a video bit stream, to theprimary bit stream and a different code, e.g. 0x20, to the secondarystream. The multiplexed signal is then transmitted. The signal can bereceived by a decoding system where a demultiplexer recognizes the twobit streams 0x01 or 0x1B (for the primary stream) and 0x20 (for thesecondary stream) and sends them both to a bit stream merger whichmerges the primary and secondary stream again and the combined videosequence is then decoded by reversing the encoding method in a decoder.

It will be appreciated that the encoder examples of FIGS. 14-17 candirectly be transferred to the corresponding operations at the decoderend. Specifically, FIG. 19 illustrates a basic decoding module which isa decoding module complementary to the basic encoding module of FIG. 13.The basic decoding module has an encoder data input for receivingencoder data for an encoded image which is to be decoded. Similarly tothe basic encoding module, the basic decoding module comprises aplurality of prediction memories 1901 as well as a prediction input forreceiving a prediction for the encoded image that is to be decoded. Thebasic decoding module comprises a decoder unit 1903 which decodes theencoding data based on the prediction(s) to generate a decoded imagewhich is output on the decoder output OUT_(loc). The decoded image isfurther fed to the prediction memories. As for the basic encodingmodule, the prediction data on the prediction input may overwrite datain prediction memories 1901. Also, similarly to the basic encodingmodule, the basic decoding module has an (optional) output for providinga delayed decoded image.

It will be clear that such a basic decoding module can be usedcomplementary to the basic encoding module in the examples of FIG.14-17. For example, FIG. 20 illustrates a decoder complementary to theencoder of FIG. 14. A multiplexer (not shown) separates the LDR encodingdata Enc LDR and the HDR encoding data Enc HDR. A first basic decodingmodule decodes the LDR image and uses this to generate a prediction forthe HDR image as explained from FIG. 14. A second basic decoding module(identical to the first basic decoding module or indeed the first basicdecoding module used in time sequential fashion) then decodes the HDRimage from the HDR encoding data and the prediction.

As another example. FIG. 21 illustrates an example of a complementarydecoder to the encoder of FIG. 15. In the example, encoding data for theleft image is fed to a first basic decoding module which decodes theleft image. This is further fed to the prediction input of a secondbasic decoding module which also receives encoding data for the rightimage and which proceeds to decode this data based on the predictionthereby generating the right image.

As yet another example, FIG. 22 illustrates an example of acomplementary decoder to the encoder of FIG. 16.

It will be appreciated that FIGS. 20-22 are functional illustrations andmay reflect a time sequential use of the same decoding module or maye.g. illustrate parallel applications of identical decoding modules.

Although the principles have been described with an encoding (decoding)employing a spatially local mapping between the LDR and HDR (colorgraded) images, other prediction strategies can be used for the LDR-HDRprediction (conversion). E.g., transformation strategies can be used onlocal regions of a picture, which may be mapping functions, or evenparametric coarse level (tentative) rendering intent transformations,like e.g. the regime coding of prior European application EP10155277.6.

Also coarse semi-global adjustment profiles over a substantial regionalextent of a set of images for certain time instants can be used torelate a HDR picture with a LDR picture—possibly with further refinementdata—as e.g. virtual backlight encoding as described in EP10177155.8.

It will be appreciated that the above description for clarity hasdescribed embodiments of the invention with reference to differentfunctional circuits, units and processors. However, it will be apparentthat any suitable distribution of functionality between differentfunctional circuits, units or processors may be used without detractingfrom the invention. For example, functionality illustrated to beperformed by separate processors or controllers may be performed by thesame processor or controllers. Hence, references to specific functionalunits or circuits are only to be seen as references to suitable meansfor providing the described functionality rather than indicative of astrict logical or physical structure or organization.

The invention can be implemented in any suitable form includinghardware, software, firmware or any combination of these. The inventionmay optionally be implemented at least partly as computer softwarerunning on one or more data processors and/or digital signal processors.The elements and components of an embodiment of the invention may bephysically, functionally and logically implemented in any suitable way.Indeed the functionality may be implemented in a single unit, in aplurality of units or as part of other functional units. As such, theinvention may be implemented in a single unit or may be physically andfunctionally distributed between different units, circuits andprocessors.

Although the present invention has been described in connection withsome embodiments, it is not intended to be limited to the specific formset forth herein. Rather, the scope of the present invention is limitedonly by the accompanying claims. Additionally, although a feature mayappear to be described in connection with particular embodiments, oneskilled in the art would recognize that various features of thedescribed embodiments may be combined in accordance with the invention.In the claims, the term comprising does not exclude the presence ofother elements or steps.

Furthermore, although individually listed, a plurality of means,elements, circuits or method steps may be implemented by e.g. a singlecircuit, unit or processor. Additionally, although individual featuresmay be included in different claims, these may possibly beadvantageously combined, and the inclusion in different claims does notimply that a combination of features is not feasible and/oradvantageous. Also the inclusion of a feature in one category of claimsdoes not imply a limitation to this category but rather indicates thatthe feature is equally applicable to other claim categories asappropriate. Furthermore, the order of features in the claims do notimply any specific order in which the features must be worked and inparticular the order of individual steps in a method claim does notimply that the steps must be performed in this order. Rather, the stepsmay be performed in any suitable order. In addition, singular referencesdo not exclude a plurality. Thus references to “a”, “an”, “first”,“second” etc do not preclude a plurality. Reference signs in the claimsare provided merely as a clarifying example shall not be construed aslimiting the scope of the claims in any way.

1. A method of encoding an input image, the method comprising: receivingthe input image; generating a mapping relating input data in the form ofinput sets of image spatial positions and a combination of colorcoordinates of low dynamic range pixel values associated with the imagespatial positions to output data in the form of high dynamic range pixelvalues in response to a reference low dynamic range image and acorresponding reference high dynamic range image; and generating anoutput encoded data stream by encoding the input image in response tothe mapping.
 2. The method of claim 1 wherein the input image is aninput high dynamic range image; and the method further comprises:receiving an input low dynamic range image corresponding to the inputhigh dynamic range image; generating a prediction base image from theinput low dynamic range image predicting a predicted high dynamic rangeimage from the prediction base image in response to the mapping;encoding a residual high dynamic range image in response to thepredicted high dynamic range image and the input high dynamic rangeimage to generate encoded high dynamic range data; and including theencoded high dynamic range data in the output encoded data stream. 3.The method of claim 1 wherein each input set corresponds to a spatialinterval for each spatial image dimension and at least one valueinterval for the combination, and the generation of the mappingcomprises for each image position of at least a group of image positionsof the reference low dynamic range image: determining at least onematching input set having spatial intervals corresponding to the eachimage position and a value interval for the combination corresponding toa combination value for the each image position in the reference lowdynamic range image; and determining an output high dynamic range pixelvalue for the matching input set in response to a high dynamic rangepixel value for the each image position in the reference high dynamicrange image.
 4. The method of claim 1 wherein the mapping is at leastone of: a spatially subsampled mapping; a temporally subsampled mapping;and a combination value subsampled mapping.
 5. The method of claim 1wherein the input image is an input high dynamic range image; and themethod further comprises: receiving an input low dynamic range imagecorresponding to the input high dynamic range image; generating aprediction base image from the input low dynamic range image predictinga predicted high dynamic range image from the prediction base image inresponse to the mapping; and adapting at least one of the mapping and aresidual high dynamic range image for the predicted high dynamic rangeimage in response to a comparison of the input high dynamic range imageand the predicted high dynamic range image.
 6. The method of claim 1wherein the input image is the reference high dynamic range image andthe reference low dynamic range image is an input low dynamic rangeimage corresponding to the input image.
 7. The method of claim 1 whereinthe input sets for the mapping further comprises depth indicationsassociated with image spatial positions and the mapping further reflectsa relationship between depth and high dynamic range pixel values.
 8. Themethod of claim 1, wherein the generating an output encoded data streamcomprises adding a derived mapping specification to the output encodeddata stream on the basis of at least parts of the mapping.
 9. A methodof generating a high dynamic range image from a low dynamic range image,the method comprising: receiving the low dynamic range image; providinga mapping relating input data in the form of input sets of image spatialpositions and a combination of color coordinates of low dynamic rangepixel values associated with the image spatial positions to output datain the form of high dynamic range pixel values, the mapping reflecting adynamic range relationship between a reference low dynamic range imageand a corresponding reference high dynamic range image; and generatingthe high dynamic range image in response to the low dynamic range imageand the mapping.
 10. The method of claim 9 wherein generating the highdynamic range image comprises determining at least part of a predictedhigh dynamic range image by for each position of at least part of thepredicted dynamic range image: determining at least one matching inputset matching the each position and a first combination of colorcoordinates of low dynamic range pixel values associated with the eachposition; retrieving from the mapping at least one output high dynamicrange pixel value for the at least one matching input set; determining ahigh dynamic range pixel value for the each position in the predictedhigh dynamic range image in response to the at least one output highdynamic range pixel value; and determining the high dynamic range imagein response to the at least part of the predicted high dynamic rangeimage.
 11. The method of claim 9 wherein the low dynamic range image isan image of a low dynamic range video sequence and the method comprisesgenerating the mapping using a previous low dynamic range image of thelow dynamic range video sequence as the reference low dynamic rangeimage and a previous high dynamic range image generated for the previouslow dynamic range image as the reference high dynamic range image. 12.The method of claim 11 wherein the previous high dynamic range image isfurther generated in response to residual image data for the previouslow dynamic range image relative to predicted image data for theprevious low dynamic range image.
 13. The method of claim 9 wherein thelow dynamic range image is an image of a low dynamic range videosequence, and the method further comprises using a nominal mapping forat least some low dynamic range images of the low dynamic range videosequence.
 14. The method of claim 9 wherein the combination isindicative of at least one of a texture, gradient, and spatial pixelvalue variation for the image spatial positions.
 15. The method of claim9 wherein the input sets for the mapping further comprises depthindications associated with image spatial positions, and the mappingfurther reflects a relationship between depth and high dynamic rangepixel values.
 16. A device for encoding an input image, the devicecomprising: a receiver for receiving the input image; a mappinggenerator for generating a mapping relating input data in the form ofinput sets of image spatial positions and a combination of colorcoordinates of low dynamic range pixel values associated with the imagespatial positions to output data in the form of high dynamic range pixelvalues for a reference low dynamic range image and a correspondingreference high dynamic range image; and an output processor forgenerating output encoded data by encoding the input image in responseto the mapping.
 17. A device as claimed in claim 16, in which the outputprocessor is arranged to include in the output encoded data at least oneof a derived mapping and a residual high dynamic range image.
 18. Anapparatus comprising the device of claim 17; input connection means forreceiving a signal comprising the input image and feeding it to thedevice of claim 17; and output connection means for outputting theoutput encoded data stream from the device of claim
 17. 19. A device forgenerating a high dynamic range image from a low dynamic range image,the device comprising: a receiver for receiving the low dynamic rangeimage; a mapping processor for providing a mapping relating input datain the form of input sets of image spatial positions and a combinationof color coordinates of low dynamic range pixel values associated withthe image spatial positions to output data in the form of high dynamicrange pixel values, the mapping reflecting a dynamic range relationshipbetween a reference low dynamic range image and a correspondingreference high dynamic range image; and an image generator forgenerating the high dynamic range image in response to the low dynamicrange image and the mapping.
 20. A device for generating a high dynamicrange image from a low dynamic range image as claimed in claim 19,comprising means to receive a residual high dynamic range image and acorrection unit to apply e.g. by addition the residual high dynamicrange image to the high dynamic range image in response to the lowdynamic range image from the mapping.
 21. A device for generating a highdynamic range image from a low dynamic range image as claimed in claim19, in which the mapping processor is further arranged to determine themapping at least partially on the basis of a received derived mapping.22. An apparatus comprising: the device of claim 19; input connectionmeans for receiving the low dynamic range image and feeding it to thedevice of claim 19; output connection means for outputting a signalcomprising the high dynamic range image from the device of claim
 19. 23.An encoded signal comprising: an encoded low dynamic range image; andresidual image data for the low dynamic range image, at least part ofthe residual image data being indicative of a difference between adesired high dynamic range image corresponding to the low dynamic rangeimage and a predicted high dynamic range image resulting fromapplication of a mapping to the encoded low dynamic range image, wherethe mapping relates input data in the form of input sets of imagespatial positions and a combination of color coordinates of low dynamicrange pixel values associated with the image spatial positions to outputdata in the form of high dynamic range pixel values, the mappingreflecting a dynamic range relationship between a reference low dynamicrange image and a corresponding reference high dynamic range image. 24.An encoded signal as claimed in claim 23 further comprising at least oneof a further information for specifying or modifying the mapping orinformation specifying properties of images such as spatial parts ofimages from which to determine the mapping.
 25. A storage mediumcomprising the encoded signal of claim 23.